Abstract:
The thesis focuses on the optimization of an existing algorithm called Treant
for the generation of robust decision trees.
Despite its good performances from the machine learning point of view, unfortunately,
the code presented some strong limitations when employed with big datasets.
The algorithm was originally written in Python,
a very good programming language for fast prototyping but,
as well as many other interpreted languages,
it can lead to poor performances when it is asked to crunch a big amount
of numbers if not supported by appropriated libraries.
The code has been translated to the C++ compiled language,
it has been parallelized with the OpenMP library,
along with other optimizations regarding the memory management and
the choice of third party libraries.
A python module has been generated from the C++ code in order to
expose an interface for the efficient C++ classes and use them as native Python classes.
In this way, any python user can exploit both the Python flexibility and
the C++ performances.