On the Approximation of Correlation Clustering and Consensus Clustering
نویسندگان
چکیده
The Correlation Clustering problem has been introduced recently [N. Bansal, A. Blum, S. Chawla, Correlation Clustering, in: Proc. 43rd Symp. Foundations of Computer Science, FOCS, 2002, pp. 238–247] as a model for clustering data when a binary relationship between data points is known. More precisely, for each pair of points we have two scores measuring the similarity and dissimilarity respectively, of the two points, and we would like to compute an optimal partition where the value of a partition is obtained by summing up the similarity scores of pairs involving points from the same cluster and the dissimilarity scores of pairs involving points from different clusters. A closely related problem is Consensus Clustering, where we are given a set of partitions and we would like to obtain a partition that best summarizes the input partitions. The latter problem is a restricted case of Correlation Clustering. In this paper we prove that Minimum Consensus Clustering is APX-hard even for three input partitions, answering an open question in the literature, while Maximum Consensus Clustering admits a PTAS. We exhibit a combinatorial and practical 5 -approximation algorithm based on a greedy technique for Maximum Consensus Clustering on three partitions. Moreover, we prove that a PTAS exists for Maximum Correlation Clustering when the maximum ratio between two scores is at most a constant. © 2007 Elsevier Inc. All rights reserved.
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملA new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble
An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...
متن کاملSignal processing approaches as novel tools for the clustering of N-acetyl-β-D-glucosaminidases
Nowadays, the clustering of proteins and enzymes in particular, are one of the most popular topics in bioinformatics. Increasing number of chitinase genes from different organisms and their sequences have beenidentified. So far, various mathematical algorithms for the clustering of chitinase genes have been used butmost of them seem to be confusing and sometimes insufficient. In the...
متن کاملA Polynomial Time Approximation Scheme for k-Consensus Clustering
This paper introduces a polynomial time approximation scheme for the metric Correlation Clustering problem, when the number of clusters returned is bounded (by k). Consensus Clustering is a fundamental aggregation problem, with considerable application, and it is analysed here as a metric variant of the Correlation Clustering problem. The PTAS exploits a connection between Correlation Clusterin...
متن کاملانتخاب اعضای ترکیب در خوشهبندی ترکیبی با استفاده از رأیگیری
Clustering is the process of division of a dataset into subsets that are called clusters, so that objects within a cluster are similar to each other and different from objects of the other clusters. So far, a lot of algorithms in different approaches have been created for the clustering. An effective choice (can combine) two or more of these algorithms for solving the clustering problem. Ensemb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Comput. Syst. Sci.
دوره 74 شماره
صفحات -
تاریخ انتشار 2008