Abstract:
The huge amount of data generated everyday by different interrelated entities calls for the development of new effective methods to store, retrieve and understand these massive network data. In this thesis, we tackle this challenge by introducing a graph summarization framework based on Szemerédi's Regularity Lemma (RL). This lemma provides us with a principled way to summarize a large graph separating its main structural patterns from noise, which is common in any real-world network. Specifically, we first extend an heuristic version of the RL to improve its efficiency and its robustness against noise. We use the proposed algorithm to address graph-based clustering and image segmentation tasks. Along this path, we show how the RL can provide fresh insights into old pattern recognition and machine learning problems. In the second part of the thesis, we introduce a new heuristic algorithm which is characterized by an improvement of the summary quality both in terms of reconstruction error and of noise filtering. The new heuristic is used to address the graph search problem allowing us to speed up the search process and to reduce storage space. Finally, we study the linkage among the RL, the stochastic block model and the minimum description length. In particular, we develop a graph decomposition algorithm based on a stochastic block model, where the RL is used as a prototype of the structural information which should be preserved, defining a new model space for graph-data.