On Potts Model Clustering, Kernel K-means, and Density Estimation

نویسندگان

  • Alejandro Murua
  • Larissa Stanberry
  • Werner Stuetzle
چکیده

Many clustering methods, such as K-means, kernel K-means, and MNcut clustering, follow the same recipe: (1) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; (iii) optimize this figure of merit over partitions. Potts model clustering, introduced by Blatt, Wiseman, and Domany (1996) represents an interesting variation on this recipe. Blatt et al. define a new figure of merit for partitions that is formally similar to the Hamiltonian of the Potts model for ferromagnetism extensively studied in statistical physics. For each temperature T , the Hamiltonian defines a distribution assigning a probability to each possible configuration of the physical system or, in the language of clustering, to each partition. Instead of searching for a single partition optimizing the Hamiltonian, they sample a large number of partitions from this distribution for a range of temperatures. They propose a heuristic for choosing an appropriate temperature and from the sample of partitions associated with this chosen temperature, they then derive what we call a consensus clustering: two observations are put in the same consensus cluster if they belong to the same cluster in the majority of the random partitions. In a sense, the consensus clustering is an “average” of plausible configurations, and we would expect it to be more stable (over different samples) than the configuration optimizing the Hamiltonian. The goal of this paper is to contribute to the understanding of Potts model clustering and to propose extensions and improvements: (1) We show that the Hamiltonian used in Potts model clustering is closely related to the kernel K-means and MNCut criteria. (2) We propose a modification of the Hamiltonian penalizing unequal cluster sizes and show that it can be interpreted as a weighted version of the kernel K-means criterion. (3) We introduce a new version of the Wolff algorithm to simulate configurations from the distribution defined by the penalized Hamiltonian, leading to penalized Potts model clustering. (4) We note a link between kernel based clustering methods and non-parametric density estimation and exploit it to automatically determine locally adaptive kernel bandwidths. (5) We propose a new simple rule for selecting a good temperature T .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Conditional-Potts Clustering Model

A Bayesian kernel-based clustering method is presented. The associated model arises as an embedding of the Potts density for label membership probabilities into an extended Bayesian model for joint data and label membership probabilities. The method may be seen as a principled extension of the so-called super-paramagnetic clustering. The model depends on three parameters: the temperature, the k...

متن کامل

Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data

Kernel density estimators are the basic tools for density estimation in non-parametric statistics.  The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in  which  the  bandwidth  is varied depending on the location of the sample points. In this paper‎, we  initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...

متن کامل

Color Image Segmentation via Improved K-Means Algorithm

Data clustering techniques are often used to segment the real world images. Unsupervised image segmentation algorithms that are based on the clustering suffer from random initialization. There is a need for efficient and effective image segmentation algorithm, which can be used in the computer vision, object recognition, image recognition, or compression. To address these problems, the authors ...

متن کامل

Learning mixtures by simplifying kernel density estimators

Gaussian mixture models are a widespread tool for modeling various and complex probability density functions. They can be estimated by various means, often using Expectation-Maximization or Kernel Density Estimation. In addition to these well known algorithms, new and promising stochastic modeling methods include Dirichlet Process mixtures and k-Maximum Likelihood Estimators. Most of the method...

متن کامل

Working Paper Alfred P. Sloan School of Management Using the K-means Clustering Method as a Density Estii-lation Procedure Using the K-means Clustering Method as a Density Estimtion Procedure

A random sample of size N is divided into k clusters that minimize the within cluster sum of squares locally. This k-means clustering method can be used as a quick procedure for constructing variable-cell historgrams that have no empty cell. A histogram estimate is proposed in this paper, and is shown to be uniformly consistent in probability.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006