Abstract:
In the current era of Web 2.0 and Social Networks, data scientists and statisticians face a new challenge, and a new opportunity: the analysis of huge amounts of textual data. Text Analysis is a relatively established field of research, but a relatively new one for social and economic sciences. Nevertheless, its importance for these subjects is predicted to increase in the near and distant future.
This work focuses on unstructured textual datasets, and its aim is to illustrate how to perform an essential textual analysis on them. In particular, the analysis will deal with data coming from Social Networks, more precisely from Twitter, so that the starting point consists of a large collection of tweets, in which each tweet constitutes a statistical unit. A case study will be offered as a practical example to show different text mining and sentiment analysis techniques, the topic of research being the European migrant crisis, one of the most debated issues today in Europe and abroad.
Initially, data were collected from Twitter using DMI-TCAT (Digital Methods Initiative Twitter Capture and Analysis Toolset) to track a list of Italian words related to the issue of migration on a time span of two months (late October – late December 2018). Secondly, the most significant part of the collected data was selected in order to perform on them various text mining techniques . The analysis comprehends: a first part containing descriptive statistics; a second part dedicated to sentiment analysis; a third and last part using more complex instruments in order to explore semantic networks and identify sub-cultures