Abstract:
The growing importance of big data and the increased environment complexity have led to an increase in the implementation machine learning algorithms, given their ability to efficiently deal with entangled situations. This study contributes to the framework regarding the application of random forests and other machine learning algorithms. Specifically, the topic of research is company failure and probability of default. The major impact that the firm’s default has on businesses, markets, and societies, underlines the importance of developing models which predict the probability of default. This research attempts to address this topic with two purposes: create an accurate binary model to classify companies in Defaulted and Non-Defaulted; identify the most important predictors in order to understand the links between the financial ratios considered and the companies’ status.
Random forests’ ability to deal with big data sets and with various and diverse predictors have led to choosing this algorithm to analyze the topic of research. Building on a literature review of decision trees, random forests, company failure, and the models which predict the probability of default, this study’s analysis is constructed through several experiments which permit to tune the model appropriately and construct the final model which provide the highest accuracy. Through its cross-sectional analysis, this research confirms random forests’ strong stability and its consistent performance. The final model generated performs well, and identifies in the coverage of fixed assets, gross profit, net working capital, cost of debt, debt to equity ratio, leverage, solvency ratio, and return on assets, the most important default predictors.
Finally, the results and methods applied have been jointly used to extend the purpose of this research. In order to permit further development of this study and of research on random forest and machine learning, an R programming code which permits to reproduce the computations carried out is provided. Importantly, the designed function is applicable to any data set to permit the analysis of different topics as well and provides a visual representation of the results through a Shiny App, permitting an easier interpretation of results.