supervised clustering

Semi supervised clustering for Text Clustering

2014

N. Saranya

ABSTRACT: Based on clustering algorithm Affinity Propagation (AP) I present this paper a semisupervised text clustering algorithm, called Seeds Affinity Propagation (SAP). There are two main contributions in my approach: 1) a similarity metric that captures the structural information of texts, and 2) seed construction method to improve the semisupervised clustering process. To study the perform...

متن کامل

Clustering and Supervised Learning 3 3 . 3 Perceptron List Modeler

2010

Annaka Kalton Pat Langley Kiri Wagstaff Jungsoon Yoo

ABSTRACT Clustering algorithms have become increasingly important in handling and analyzing data. Considerable work has been done in devising e ective but increasingly speci c clustering algorithms. In contrast, we have developed a generalized framework that accommodates diverse clustering algorithms in a systematic way. This framework views clustering as a general process of iterative optimiza...

متن کامل

Active Semi-Supervision for Pairwise Constrained Clustering

2004

Sugato Basu Arindam Banerjee Raymond J. Mooney

Semi-supervised clustering uses a small amount of supervised data to aid unsupervised learning. One typical approach specifies a limited number of must-link and cannotlink constraints between pairs of examples. This paper presents a pairwise constrained clustering framework and a new method for actively selecting informative pairwise constraints to get improved clustering performance. The clust...

متن کامل

The Graduate School SEMI - SUPERVISED CLUSTERING FOR HIGH - DIMENSIONAL AND SPARSE FEATURES

2010

Dongwon Lee Carleen Maitland

Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some “weak” form of side information about the domain or data sets can be often availabl...

متن کامل

Discovery of feature-based hot spots using supervised clustering

Journal: :Computers & Geosciences 2009

Wei Ding Tomasz F. Stepinski Rachana Parmar Dan Jiang Christoph F. Eick

Feature-based hot spots are localized regions where the attributes of objects attain high values. There is considerable interest in automatic identification of feature-based hot spots. This paper approaches the problem of finding feature-based hot spots from a data mining perspective, and describes a method that relies on supervised clustering to produce a list of hot spot regions. Supervised c...

متن کامل

Supervised Clustering and Fuzzy Decision Tree Induction for the Identification of Compact Classifiers

2004

Ferenc Peter Pach Janos Abonyi Peter Arva

Fuzzy decision tree induction algorithms require the fuzzy quantization of the input variables. This paper demonstrates that supervised fuzzy clustering combined with similarity-based rule-simplification algorithms is an effective tool to obtain the fuzzy quantization of the input variables, so the synergistic combination of supervised fuzzy clustering and fuzzy decision tree induction can be e...

متن کامل

Semi-supervised clustering with metric learning: An adaptive kernel method

Journal: :Pattern Recognition 2010

Xuesong Yin Songcan Chen Enliang Hu Daoqiang Zhang

Most existing representative works in semi-supervised clustering do not sufficiently solve the violation problem of pairwise constraints. On the other hand, traditional kernel methods for semi-supervised clustering not only face the problem of manually tuning the kernel parameters due to the fact that no sufficient supervision is provided, but also lack a measure that achieves better effectiven...

متن کامل

An MBO scheme for clustering and semi-supervised clustering of signed networks

Journal: :Communications in Mathematical Sciences 2021

We introduce a principled method for the signed clustering problem, where goal is to partition weighted undirected graph whose edge weights take both positive and negative values, such that edges within same cluster are mostly positive, while spanning across clusters negative. Our relies on graph-based diffuse interface model formulation utilizing Ginzburg-Landau functional, based an adaptation...

متن کامل

Tri-training and Data Editing Based Semi-supervised Clustering Algorithm

2006

Chao Deng Maozu Guo

Semi-Supervised clustering algorithms often utilize a seeds set consisting of a small amount of labeled data to initialize cluster centroids, hence improve the clustering performance over whole data set. Both the scale and quality of seeds set directly restrict the performance of semi-supervised clustering algorithm. In this paper, a new algorithm named DE-Tri-training semi-supervised K-means i...

متن کامل

Clustering Heterogeneous Data with Mutual Semi-supervision

2012

Artur Abdullin Olfa Nasraoui

We propose a new methodology for clustering data comprising multiple domains or parts, in such a way that the separate domains mutually supervise each other within a semi-supervised learning framework. Unlike existing uses of semi-supervised learning, our methodology does not assume the presence of labels from part of the data, but rather, each of the different domains of the data separately un...

متن کامل