Enhancing web search user experience : from document retrieval to task recommendation

dc.contributor.advisor	Orlando, Salvatore	it_IT
dc.contributor.author	Tolomei, Gabriele <1980>	it_IT
dc.date.accessioned	2012-07-07T08:13:20Z	it_IT
dc.date.accessioned	2012-07-30T16:05:45Z
dc.date.available	2012-07-07T08:13:20Z	it_IT
dc.date.available	2012-07-30T16:05:45Z
dc.date.issued	2011-11-17	it_IT
dc.identifier.uri	http://hdl.handle.net/10579/1231	it_IT
dc.description.abstract	Il World Wide Web è la più grande sorgente dati mai realizzata dall’uomo. Ciò ha fatto sì che il Web divenisse sempre più il “luogo” di riferimento per accedere a qualsiasi tipo di informazione, attraverso l’uso dei motori di ricerca. Infatti, gli utenti tendono a rivolgersi ai motori di ricerca non solo per consultare pagine Web ma per eseguire vere e proprie attività (ad es., per organizzare vacanze, ottenere un visto, organizzare una festa, etc.). In questa tesi di dottorato, si descrivono e affrontano due sfide fondamentali tese a migliorare l’esperienza di ricerca sul Web offerta dagli attuali motori di ricerca, ovvero la scoperta e la raccomandazione di cosiddetti “Web tasks”. Entrambe queste sfide si basano su una reale comprensione dei comportamenti di ricerca degli utenti, che può essere raggiunta mediante l’applicazione di tecniche di query log mining. I processi di ricerca degli utenti sono analizzati ad un più alto livello di astrazione, ovvero da una prospettiva “task-by-task” anziché “query-by-query”. In questo modo è possible realizzare un modello di attività di ricerca che fornisca adeguato supporto alla “vita sul Web” degli utenti.	it_IT
dc.description.abstract	The World Wide Web is the biggest and most heterogeneous database that humans have ever built, making it the place of choice where people search for any sort of information through Web search engines. Indeed, users are increasingly asking Web search engines for performing their daily tasks (e.g., "planning holidays", "obtaining a visa", "organizing a birthday party", etc.), instead of simply looking for Web pages. In this Ph.D. dissertation, we sketch and address two core research challenges that we claim next-generation Web search engines should tackle for enhancing user search experience, i.e., Web task discovery and Web task recommendation. Both these challenges rely on the actual understanding of user search behaviors, which can be achieved by mining knowledge from query logs. Search processes of many users are analyzed at a higher level of abstraction, namely from a "task-by-task" instead of a "query-by-query" perspective, thereby producing a model of user search tasks, which in turn can be used to support people during their daily "Web lives".	it_IT
dc.format.medium	Tesi cartacea	it_IT
dc.language.iso	en	it_IT
dc.publisher	Università Ca' Foscari Venezia	it_IT
dc.rights	© Gabriele Tolomei, 2011	it_IT
dc.subject	Web search	it_IT
dc.subject	Web mining	it_IT
dc.subject	Query log mining	it_IT
dc.subject	Task recommendation	it_IT
dc.title	Enhancing web search user experience : from document retrieval to task recommendation	it_IT
dc.type	Doctoral Thesis	it_IT
dc.degree.name	Informatica	it_IT
dc.degree.level	Dottorato di ricerca	it_IT
dc.degree.grantor	Scuola di dottorato in Scienze e tecnologie (SDST)	it_IT
dc.description.academicyear	2009/2010	it_IT
dc.description.cycle	23	it_IT
dc.degree.coordinator	Salibra, Antonino	it_IT
dc.location.shelfmark	D001161	it_IT
dc.location	Venezia, Archivio Università Ca' Foscari, Tesi Dottorato	it_IT
dc.rights.accessrights	openAccess	it_IT
dc.thesis.matricno	955515	it_IT
dc.format.pagenumber	[10], VI, 148 p.	it_IT
dc.subject.miur	INF/01 INFORMATICA	it_IT
dc.description.tableofcontent	1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Web Search Engines . . . . . . . . . . . . . . . . . . . . . . 7 2.1 The Big Picture . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Crawling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 The Crawling Algorithm . . . . . . . . . . . . . . . . . 11 2.2.2 The Crawl Frontier . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 Web Page Fetching . . . . . . . . . . . . . . . . . . . . . 13 2.2.4 Web Page Parsing . . . . . . . . . . . . . . . . . . . . . . 14 2.2.5 Web Page Storing . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 Text-based Indexing . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Link-based Indexing . . . . . . . . . . . . . . . . . . . . 18 2.4 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Text-based Ranking . . . . . . . . . . . . . . . . . . . . 20 2.4.2 Link-based Ranking . . . . . . . . . . . . . . . . . . . . 22 3 Query Log Mining . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1 What is a Query Log? . . . . . . . . . . . . . . . . . . . . . 28 3.2 A Characterization of Web Search Queries . . . . . 30 3.3 Time Analysis of Query Logs . . . . . . . . . . . . . . . 35 3.4 Time-series Analysis of Query Logs . . . . . . . . . . 41 3.5 Privacy Issues in Query Logs . . . . . . . . . . . . . . . . 44 3.6 Applications of Query Log Mining . . . . . . . . . . . . 45 3.6.1 Search Session Discovery . . . . . . . . . . . . . . . . . 46 3.6.2 Query Suggestion . . . . . . . . . . . . . . . . . . . . . . 49 4 Search Task Discovery . . . . . . . . . . . . . . . . . . . . . . 57 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3 Query Log Analysis . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3.1 Session Size Distribution . . . . . . . . . . . . . . . . . . 65 4.3.2 Query Time-Gap Distribution . . . . . . . . . . . . . . . 66 4.4 Task Discovery Problem . . . . . . . . . . . . . . . . . . . . . 67 4.4.1 Theoretical Model . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5 Ground-truth: De nition and Analysis . . . . . . . . . . . 69 4.6 Task-based Query Similarity . . . . . . . . . . . . . . . . . 74 4.6.1 Time-based Approach . . . . . . . . . . . . . . . . . . . . 75 4.6.2 Unsupervised Approach . . . . . . . . . . . . . . . . . . . 75 4.6.3 Supervised Approach . . . . . . . . . . . . . . . . . . . . . 78 4.7 Task Discovery Methods . . . . . . . . . . . . . . . . . . . . 83 4.7.1 TimeSplitting-t . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.7.2 QueryClustering-m . . . . . . . . . . . . . . . . . . . . . . 85 4.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.8.1 Validity Measures . . . . . . . . . . . . . . . . . . . . . . . 88 4.8.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5 Search Task Recommendation . . . . . . . . . . . . . . . . 101 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3 Anatomy of a Task Recommender System . . . . . . 106 5.4 Task Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.4.1 Basic Task Representation . . . . . . . . . . . . . . . . . 109 5.4.2 Task Document Clustering . . . . . . . . . . . . . . . . 109 5.5 Task Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.5.1 Random-based (baseline) . . . . . . . . . . . . . . . . . 110 5.5.2 Sequence-based . . . . . . . . . . . . . . . . . . . . . . . . 111 5.5.3 Association-Rule based . . . . . . . . . . . . . . . . . . . 111 5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . 112 5.6.2 Evaluating Recommendation Precision . . . . . . . . 121 5.6.3 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.6.4 Anecdotal Evidences . . . . . . . . . . . . . . . . . . . . . 127 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . 135	it_IT
dc.identifier.bibliographiccitation	Tolomei, Gabriele. "Enhancing web search user experience : from document retrieval to task recommendation", Università Ca' Foscari Venezia, Tesi di Dottorato, XXIII Ciclo, 2011	it_IT