Abstract:
This thesis investigates multinomial logistic regression in presence of high-dimensional data. Multinomial logistic regression has been widely used to model categorical data in a variety of fields, including health, physical and social sciences. In this thesis we apply to multinomial logistic regression three different kind of dimensionality reduction techniques, namely ridge regression, lasso and principal components regression. These methods reduce the dimensions of the design matrix used to build the multinomial logistic regression model by selecting those explanatory variables that most affect the response variable. We carry out an extensive simulation study to compare and contrast the three reduction methods. Moreover, we illustrate the multinomial regression model on different case studies that allow to highlight benefits and limits of the different approaches.