Abstract:
This project aims to verify if, with specific statistical learning techniques, we can build a model for the estimation of a car insurance policy.
The starting point is a dataset referring to hundreds of thousands of quotes stored in a web estimate calculator, provided to brokers and insurance agents. We want to see if this information can be used for finding a model that can replace fee calculation operations, usually based on very complicated formulas.
This proposal is a very important challenge which will support a management decision process. It involves using big data to provide more advanced services to customers, and increase competitiveness.
This pas was chosen because in the last few years the spread of the Internet and new technologies have revolutionised the ability to process large amounts of data. This revolution is also taking place in the insurance sector, where some kinds of questions like pricing, cross-selling, claim innovation and fraud prevention can take advantage from the large amount of collected information.
This, in fact, represents a challenge, because all these data have a hidden potential that can be used to offer to customer the so called 'next best product' at the right time. Working with big data means braiding technology with business, and companies which can understand before others this interaction surely can benefit from proposing new business solutions in advance. Therefore, we want to use big data to see if it is possible to predict the price of an insurance premium by using statistical techniques, without analysing the real algorithm used to build the cost of the insurance.
Although it may seem an easy task, this work hides many difficulties, due to the fact that it is a preliminary phase in which we not only explore the feasibility of an idea, but also get familiar with new methods of data analysis.
This is why the work is developed into two steps: on the one hand, we consider the purely theoretical and scientific aspects useful to provide the basic knowledge; on the other hand, we use programming for implementing complex statistical analysis on a large dataset.