Abstract:
Every user leaves traces of her/his behaviour when she/he surfs the Web. All the usage data generated by users is stored in logs of several web applications, and such logs can be used to extract useful knowledge for enhancing and improving performance of online services. Also Search Engines (SEs) store usage information in so-called query logs, which can be used in different ways to improve the SE user experience. In this thesis we focus on improving the performance of a SE, in particular its effectiveness and efficiency, through query log mining. We propose to enhance the performance of SEs by discussing a novel Query Recommender System. We prove that is possible to decrease the length of a user's query session by unloading the SE of part of the queries that the user submits in order to refine his initial search. This approach helps the user find what she/he is searching in a shorter period of time, while at the same time decreasing the number of queries that the SE must process, and thus decreasing the overall server load. We also discuss how to enhance the SE efficiency by optimizing the use of its computational resources. The knowledge extracted from a query log is used to dynamically adjust the query processing method by adapting the pruning strategy to the SE load. In particular query logs permit to build a regressive model used to predict the response time for any query, when different pruning strategies are applied during query processing. The prediction is used to ensure a minimum quality of service when the system is heavily loaded, by trying to process the various enqueued queries by a given deadline. Our study also addresses the problem of the effectiveness of query results by comparing their quality when dynamic pruning is adopted to reduce the query processing times. Finally, we also study how response times and results vary when, in presence of high loads, processing is either interrupted after a fixed time threshold elapses or dropped completely. Moreover, we introduce a novel query dropping strategy based on the same query performance predictors discussed above.