Gestione della clientela tramite analisi di reti sociali:Telecomunicazioni e Twitter

DSpace Home
→
Università Ca' Foscari Venezia
→
Archivio delle tesi
→
Tesi di laurea (dall'anno accademico 2011/2012)
→
View Item

dc.contributor.advisor	Orlando, Salvatore	it_IT
dc.contributor.author	Quintavalle, Bruno <1966>	it_IT
dc.date.accessioned	2014-02-05	it_IT
dc.date.accessioned	2014-03-29T10:46:10Z
dc.date.available	2015-04-07T13:58:30Z
dc.date.issued	2014-03-11	it_IT
dc.identifier.uri	http://hdl.handle.net/10579/4475
dc.description.abstract	The web is reach of interesting and profitable informations, written by humans for other humans in different forms and formats. Blogs, forums and social networks are examples of different environments, where opinions about products, alerts about problems and sentiment about companies and competitors can be harvested for business or other purposes. The difficult is that most of the informations are intended to be read by humans, and there is very often too much to be managed by a single person in a reasonable amount of time. Here a method is described, whose main purpose is to classify Twitter messages into categories of interest, in order to present to the human reader only a greatly reduced number of texts to read and further analyze. The categories of interest considered here are related to telecommunication companies, but the method presented can easily be adapted to any other theme. Automatic classified messages may also simply be counted over periods of time, to reveal significative events, or cross-tabled by telephone company, to make comparison between them. The first step of the method is to extract features from the texts, using dictionaries and a primitive but effective form of stemming. Then a classifier made with an ensemble of three basic classifiers (a decision tree, a neural network and a naïve Bayesian classifier) is used to assign labels to the text. Because of the unavoidable ambiguity of text categories, the classifier has been made to be both hierarchical (able to deal with a taxonomy of labels) and multi label (able to assign more labels to the same text). Twitter messages have a maximum length of 140 characters, and this certainly simplify their analysis. Anyway, with a slight modification of the algorithm, the same set neural networks trained for Twitter has been successfully used to label texts of any length. Il web è ricco di informazioni utili e talvolta remunerative, scritte da esseri umani per altri esseri umani in molteplici forme e formati. Blogs, forums e social networks sono esempi diversi di ambienti dove opinioni sui prodotti, avvisi relativi a problemi e sentimenti su compagnie e concorrenti possono essere raccolte per motivi di business o altro. La difficoltà sta appunto nel fatto che la maggior parte delle informazioni sono scritte per essere lette da esseri umani, e sono in genere in quantità troppo elevata per essere gestite da un’unica persona in un lasso di tempo ragionevole. Qui un software viene descritto il cui scopo è di classificare messaggi di Twitter in categorie di interesse, per poter presentare al lettore umano solo un numero altamente ridotto di messaggi da leggere ed ulteriormente analizzare. Le categorie di interesse qui considerate sono relative alle compagnie di telecomunicazioni, ma il metodo presentato può facilmente essere adattato a qualsiasi altro soggetto. I messaggi classificati automaticamente possono anche venir semplicemente contati per periododi tempo, per rivelare eventi significativi, o tabulati per compagnia telefonica, per confrontarle tra loro. Il primo passo del metodo consiste nell’estrarre le caratteristiche dal testo, usando dizionari ed una primitiva ma efficace forma di stemming. Quindi un ensamble di classificatori di base (un decision tree, una rete neurale ed un naive Bayesian classifier) è stato usato per assegnare le labels al testo. A causa dell’inevitabile ambiguità delle categorie di testo, il classificatore è stato costruito in modo da poter essere sia gerarchico (capace di gestire tassonomie di etichette) che multi-label (capage di assegnare più etichette allo stesso testo). I messaggi di Twitter hanno una lunghezza massima di 140 caratteri, e questo certamente ne semplifica l’analisi. Tuttavia, con una piccola modifica all’algoritmo, lo stesso set di reti neurali addestrate sui messaggi di Twitter è stato usato con successo per etichettare messaggi di lunghezza qualsiasi.	it_IT
dc.language.iso	en	it_IT
dc.publisher	Università Ca' Foscari Venezia	it_IT
dc.rights	© Bruno Quintavalle, 2014	it_IT
dc.title	Gestione della clientela tramite analisi di reti sociali:Telecomunicazioni e Twitter	it_IT
dc.title.alternative		it_IT
dc.type	Master's Degree Thesis	it_IT
dc.degree.name	Scienze dell'informazione	it_IT
dc.degree.level	Laurea vecchio ordinamento (ante DM 509/99)	it_IT
dc.degree.grantor	Dipartimento di Scienze Ambientali, Informatica e Statistica	it_IT
dc.description.academicyear	2012/2013, sessione straordinaria	it_IT
dc.rights.accessrights	openAccess	it_IT
dc.thesis.matricno	761617	it_IT
dc.subject.miur	INF/01 INFORMATICA	it_IT
dc.description.note		it_IT
dc.degree.discipline		it_IT
dc.contributor.co-advisor		it_IT
dc.provenance.upload	Bruno Quintavalle (761617@stud.unive.it), 2014-02-05	it_IT
dc.provenance.plagiarycheck	Salvatore Orlando (orlando@unive.it), 2014-02-17	it_IT