persistent k means

Lecture 10 : k - means clustering

2012

Edo Liberty

The sets Sj are the sets of points to which μj is the closest center. In each step of the algorithm the potential function is reduced. Let’s examine that. First, if the set of centers μj are fixed, the best assignment is clearly the one which assigns each data point to its closest center. Also, assume that μ is the center of a set of points S. Then, if we move μ to 1 |S| ∑ i∈S xi then we only r...

متن کامل

Stability of k -Means Clustering

2007

Shai Ben-David Dávid Pál Hans Ulrich Simon

We consider the stability of k-means clustering problems. Clustering stability is a common heuristics used to determine the number of clusters in a wide variety of clustering applications. We continue the theoretical analysis of clustering stability by establishing a complete characterization of clustering stability in terms of the number of optimal solutions to the clustering optimization prob...

متن کامل

Turbocharging Mini-Batch K-Means

Journal: :CoRR 2016

James Newling François Fleuret

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically re...

متن کامل

A Multi-agent Based Approach to Clustering: Harnessing the Power of Agents

2011

Santhana Chaimontree Katie Atkinson Frans Coenen

A framework for multi-agent based clustering is described whereby individual agents represent individual clusters. A particular feature of the framework is that, after an initial cluster configuration has been generated, the agents are able to negotiate with a view to improving on this initial clustering. The framework can be used in the context of a number of clustering paradigms, two are inve...

متن کامل

Lecture 9 & 10 : Local Search Algorithm for k - median Problem ( contd . ) , k - means Problem

2016

Kasturi Varadarajan Tanmay Inamdar

Let us define some notation which will help us analyze the algorithm. L := A solution (k-subset) returned by Local Search. Copt := An optimal solution for the k-median problem. We will eventually show that Cost(L) ≤ 5 · Cost(Copt). For any p ∈ P,C ⊆ P, NN(p, C) := c̄ ∈ C that minimizes d(p, ·). So d(p,NN(p, C)) = d(p, C) by definition. Also, for any C ⊆ P, c̄ ∈ C, Cluster(C, c̄) := {q ∈ P | NN(q, ...

متن کامل

Adapting K-Medians to Generate Normalized Cluster Centers

2006

Benjamin J. Anderson Deborah S. Gross David R. Musicant Anna M. Ritz Thomas G. Smith Leah E. Steinberg

Many applications of clustering require the use of normalized data, such as text or mass spectra mining. The spherical K-means algorithm [6], an adaptation of the traditional K-means algorithm, is highly useful for data of this kind because it produces normalized cluster centers. The K-medians clustering algorithm is also an important clustering tool because of its wellknown resistance to outli...

متن کامل

Using Pivots to Speed-Up k-Medoids Clustering

Journal: :JIDM 2011

Adriano Arantes Paterlini Mario A. Nascimento Caetano Traina

Clustering is a key technique within the KDD process, with k-means, and the more general k-medoids, being well-known incremental partition-based clustering algorithms. A fundamental issue within this class of algorithms is to find an initial set of medians (or medoids) that improves the efficiency of the algorithms (e.g., accelerating its convergence to a solution), at the same time that it imp...

متن کامل

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

2014

Rodrigo de Carvalho Gomes Lucas Correia Ribas Amaury Antônio de Castro Junior Wesley Nunes Gonçalves

This paper describes the participation of the CPPP/UFMS group in the robot vision task. We have applied the spatial pyramid matching proposed by Lazebnik et al. This method extends bag-of-visualwords to spatial pyramids by concatenating histograms of local features found in increasingly fine sub-regions. To form the visual vocabulary, kmeans clustering was applied in a random subset of images f...

متن کامل

Range-Clustering Queries

2017

Mikkel Abrahamsen Mark de Berg Kevin Buchin Mehran Mehr Ali D. Mehrabi

In a geometric k-clustering problem the goal is to partition a set of points in R into k subsets such that a certain cost function of the clustering is minimized. We present data structures for orthogonal range-clustering queries on a point set S: given a query box Q and an integer k > 2, compute an optimal k-clustering for S ∩Q. We obtain the following results. – We present a general method to...

متن کامل

Automated Music Success Prediction

2007

Joshua Teitelbaum Niyant Krishnamurthi Sébastien Beaudet

We investigate the uses and limitations of MFCC analysis for feature extraction from music files in the domain of genre recognition. Intra-genre and Inter-genre classification is explored. We implement a method of genre classification based on MFCC extraction, K-means clustering, and KNN analysis. We demonstrate the efficacy of our method through testing, yielding a 99% accuracy rate.

متن کامل