Abstract:
Data Science has become increasingly important in recent years due to the growing volume of generated data. With the help of this discipline, it is possible to make sense of large amounts of data, commonly referred to as "Big Data" which are essential nowadays to drive the decision-making process. Data is produced in all fields, including Tourism and Mobility, where data is used to track the movement of people, identify tourist attractions and better understand customer behavior. This dissertation born through a collaboration with Motion Analytica Srl, an Italian startup that focuses on data analytics in Mobility and Tourism. The set goal is to find niche behaviors of tourists from the raw data from Tripadvisor’s platform. The work is organized in four parts. First, an Exploratory Data Analysis (EDA) phase. Second, a refining and cleaning of the data followed by application of TF-IDF (Term Frequency-Inverse Document Frequency) an approach to detect the non-naive behavior of customers - tourists in Italian points of interest. This approach is applied to a large dataset of TripAdvisor reviews, kindly granted by Motion Analytica Srl. Fourth, development of a data visualization for the refined dataset. The last part describes the ETL (Extract, Transform and Load) pipeline implemented for the business. The outcomes of this work include a working ETL pipeline and dashboard based on refined data that allow for the analysis of tourist behavior in Italy and the identification of niche tourist behavior.