Abstract:
In Machine Learning, some of the most accurate models are practically black-boxes, challenging to be interpreted and analyzed. Consequently, different strategies have been adopted to overcome these limitations, giving birth to a research area called Explainable Artificial Intelligence. In this area, models considered black boxes are Deep Neural Networks and ensemble methods. In particular, even though a single decision tree is considered explainable, tree ensembles are regarded as black-box models due to the large number of trees they typically include.
Relevant techniques to explain ensemble of decision (for classification and regression) trees are now mostly based on methods that examine the features and outcome relationships, or create an explanation via tree prototyping or approximate the model through explainable ones. Even though these approaches can give the end-user many meaningful insights into a model and its output, they do not produce a global model explanation by design and/or do not specify the type of interaction between features.
In this thesis, we move towards a new way of approaching the model explanation problem over an ensemble of regression trees by discovering frequent patterns inside the forest. A frequent patterns analysis produced from synthetic datasets created by basic algebraic functions has been performed to answer some initial questions: are there some frequent patterns related to a type of algebraic operation between features? If yes, what happens when the model tries to learn a function composed of basic operations? Multiple sub-problems have been addressed to answer the aforementioned issues.