Abstract:
Cluster Analysis (Clustering) is the process of finding group of objects where, objects in the same group will be similar (related) to one another and dissimilar from objects in other groups. A fundamental and the major problem in cluster analysis is how many clusters are appropriate for the description of a given system, which is a basic input for many clustering algorithm. In this thesis we build a new method called “On Determining the Number of Dominant-Set Clusters” for automatically estimating the number of clusters in unlabeled data sets, based on the Motzkin-Straus theorem, they were able to show a connection between clique number (ω(G)) and the global optimal value of a certain quadratic function over the standard simplex. Moreover, they have used the definition of stability number and they have shown that this maximization is equal to stability number in unweighted scenario. In our work, we have inspired by this theorem so we have extended to the weighted case to detect the number of maximal cliques (clusters). Finally we came to design a two steps method to determine the number of clusters. In the first step, we use dissimilarity matrix as an input and by minimizing it with replicator, we are able to detect the minimum number of clusters based on our defined stability number. And then, we examine the existence of undetected cluster based on the idea of efficient out-of –sample extension of dominant-set clusters paper.
After determining the number of clusters(cluster representatives) in order to check whether our approach determine the right number of clusters or not we propagate the class label using graph transduction ,a popular semi-supervised learning algorithm, to unlabeled instances and we evaluate the accuracy of clusters formed. In order check the performance of our approach we performed several test on computer generated (toy) dataset, real-world data set which are taken from UCI data repository. We also test our approach using some social network data sets to further extend our work. The experiments have done on those data sets shows promising good results.