Abstract:
Graphs and networks are everywhere, from social networks to the World Wide Web. Since the last decade, massive graphs has become the center attention of an intense research activity, both industrial and academic research centers. So these graphs are a new challenge for the storage, geometric visual representation and retrieve information. These information retrieving need a efficient techniques to compress the graph.
Compressing data consists in changing its representation in a way to require fewer bits. Depending on the reversibility of this encoding process we might have a lossy or lossless compression.
In the first part of thesis, we have addressed this problem by a two steps clustering strategy and the solution takes advantage of the strong notion of edge density Regularity introduced by Endre Szemeredi.
In the second chapter, we address the problem of encoding a graph of order n into a graph of order k < n in a way to minimize reconstruction error. This encoding is characterized in terms of a particular factorization of the adjacency matrix of the original graph. The factorization is determined as the solution of a discrete optimization problem, which is for convenience relaxed into a continuous, but equivalent, one. Our formulation does not require to have the full graph, but it can factorize the graph also in the presence of partial information. We propose a multiplicative update rule for the optimization task resembling the ones introduced for nonnegative matrix factorization, and convergence properties are proven. Experiments are conducted to assess the effectiveness of the proposed approach.
Our main contributions are summarized as:
i) We link matrix factorization with graph compression by proposing a factorization that can be used to reduce the order of a graph and can be employed also in the presence of incomplete observations. We show that the same technique can be used to compress a kernel, by retaining a kernel as the reduced representation; Moreover, we consider a general setting, where the observations of the original graph/kernel are incomplete;
ii) We cast the discrete problem of finding the best factorization into a continuous optimization problem for which we formally prove the equivalence between the discrete and continuous formulations;
iii) We provide a novel algorithm to approximately find the proposed factorization, which resembles the NMF algorithm in [57] (under ` L2 divergence) and the Baum-Eagon dynamics [6]. Additionally, we formally prove convergence properties for our algorithm and we believe that this theoretical contribution can be helpful for devising other factorization algorithms working on the domain of stochastic matrices (rather than simply nonnegative matrices);
iv) Finally, we establish a relation between clustering and our graph compression model and show that existing clustering approaches in the literature can be regarded as particular, constrained variants of our matrix factorization.