instance clustering

CS 599 : Structure and Dynamics of Networked

2005

Ashish Vaswani

So far, we have mostly talked about communities in the sense of discovering one, or a few, densely linked subgraphs. We departed from this interpretation at the end of last lecture, when we defined the notion of the modularity of a clustering. There, we are interested in the division of a graph into disjoint partitions (or clusters) of nodes, and the quality of this clustering. Clustering of da...

متن کامل

Fair Clustering Through Fairlets

2017

Flavio Chierichetti Ravi Kumar Silvio Lattanzi Sergei Vassilvitskii

We study the question of fair clustering under the disparate impact doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the k-center and the k-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions—for in...

متن کامل

Combining Data Clusterings with Instance Level Constraints

2009

João M. M. Duarte Ana L. N. Fred F. Jorge F. Duarte

Recent work has focused the incorporation of a priori knowledge into the data clustering process, in the form of pairwise constraints, aiming to improve clustering quality and find appropriate clustering solutions to specific tasks or interests. In this work, we integrate must-link and cannot-link constraints into the cluster ensemble framework. Two algorithms for combining multiple data partit...

متن کامل

Algorithm Portfolios Based on Cost-Sensitive Hierarchical Clustering

2013

Yuri Malitsky Ashish Sabharwal Horst Samulowitz Meinolf Sellmann

Different solution approaches for combinatorial problems often exhibit incomparable performance that depends on the concrete problem instance to be solved. Algorithm portfolios aim to combine the strengths of multiple algorithmic approaches by training a classifier that selects or schedules solvers dependent on the given instance. We devise a new classifier that selects solvers based on a cost-...

متن کامل

A Fuzzy Semisupervised Clustering Method: Application to the Classification of Scientific Publications

2014

Irene Diaz-Valenzuela Maria J. Martín-Bautista M. Amparo Vila

This paper introduces a new method of fuzzy semisupervised hierarchical clustering using fuzzy instance level constraints. It introduces the concepts of fuzzy must-link and fuzzy cannot-link constraints and use them to find the optimum α-cut of a dendrogram. This method is used to approach the problem of classifying scientific publications in web digital libraries. It is tested on real data fro...

متن کامل

Clustering Word Pairs to Answer Analogy Questions

2006

Ergun Biçici Deniz Yuret

We focus on answering word analogy questions by using clustering techniques. The increased performance in answering word similarity questions can have many possible applications, including question answering and information retrieval. We present an analysis of clustering algorithms’ performance on answering word similarity questions. This paper’s contributions can be summarized as: (i) casting ...

متن کامل

Information Theoretical Clustering via Semidefinite Programming

2011

Meihong Wang Fei Sha

We propose techniques of convex optimization for information theoretical clustering. The clustering objective is to maximize the mutual information between data points and cluster assignments. We formulate this problem first as an instance of max k cut on weighted graphs. We then apply the technique of semidefinite programming (SDP) relaxation to obtain a convex SDP problem. We show how the sol...

متن کامل

Correlation Clustering with Noisy Partial Information

2015

Konstantin Makarychev Yury Makarychev Aravindan Vijayaraghavan

In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value (1 + δ) opt-cost +Oδ(n log n) with high probability, where opt-cost is the value of the optimal solution (for every δ > 0). The second algorit...

متن کامل

EM-DD: An Improved Multiple-Instance Learning Technique

2001

Qi Zhang Sally A. Goldman

We present a new multiple-instance (MI) learning technique (EMDD) that combines EM with the diverse density (DD) algorithm. EM-DD is a general-purpose MI algorithm that can be applied with boolean or real-value labels and makes real-value predictions. On the boolean Musk benchmarks, the EM-DD algorithm without any tuning significantly outperforms all previous algorithms. EM-DD is relatively ins...

متن کامل

Methods for Clustering Mass Spectrometry Data in Drug Development

2000

Huiru Zheng Sarabjot Singh Anand John G Hughes Norman D Black

Isolation and purification of the active principle within natural compounds plays an important role in drug development. MS (mass spectrometry) is used as a detector in HPLC (high performance liquid chromatography) systems to aid the determination of novel compound structures. Clustering techniques provide useful tools for intelligent data analysis within this context. In this paper, we analyse...

متن کامل