k means algorithm

Application of K-means learning algorithm to U.N survey data

2017

This paper reflects the results of an implementation of the K-means algorithm on U.N survey data on people’s priorities, organized by country. The dataset includes 16 features for each country, with each feature corresponding to a different societal issue. Each country has a rating in the range of [0, 1] that indicates how important a particular feature or issue is to that country’s people– the...

متن کامل

Improved COA with Chaotic Initialization and Intelligent Migration for Data Clustering

Journal: Journal of Artificial Intelligence and Data Mining 2017

M. Lashkari, M. Moattar,

A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages ...

متن کامل

An Algorithm for Online K-Means Clustering

2016

Edo Liberty Ram Sriharsha Maxim Sviridenko

This paper shows that one can be competitive with the kmeans objective while operating online. In this model, the algorithm receives vectors v1, . . . , vn one by one in an arbitrary order. For each vector vt the algorithm outputs a cluster identifier before receiving vt+1. Our online algorithm generates Õ(k) clusters whose k-means cost is Õ(W ∗) where W ∗ is the optimal k-means cost using k cl...

متن کامل

K-Medoids For K-Means Seeding

2017

James Newling François Fleuret

We run experiments showing that algorithm clarans (Ng et al., 2005) finds better Kmedoids solutions than the standard algorithm. This finding, along with the similarity between the standard K-medoids and K-means algorithms, suggests that clarans may be an effective K-means initializer. We show that this is the case, with clarans outperforming other popular seeding algorithms on 23/23 datasets w...

متن کامل

Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering

Journal: :CoRR 2014

Apoorv Agarwal Anna Choromanska Krzysztof Choromanski

In this paper, we compare three initialization schemes for the KMEANS clustering algorithm: 1) random initialization (KMEANSRAND), 2) KMEANS++, and 3) KMEANSD++. Both KMEANSRAND and KMEANS++ have a major that the value of k needs to be set by the user of the algorithms. (Kang 2013) recently proposed a novel use of determinantal point processes for sampling the initial centroids for the KMEANS a...

متن کامل

A bad 2-dimensional instance for k-means++

Journal: :CoRR 2013

Ragesh Jaiswal Prachi Jain Saumya Yadav

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from among the given points. For i > 1, pick a point to be the i center with probability proportional to the square of the Euclidean dist...

متن کامل

Wasserstein k-means++ for Cloud Regime Histogram Clustering

2017

Matthew Staib Stefanie Jegelka

Much work has sought to discern the different types of cloud regimes, typically via Euclidean k-means clustering of histograms. However, these methods ignore the underlying similarity structure of cloud types. Wasserstein k-means clustering is a promising candidate for utilizing this structure during clustering, but existing algorithms do not scale well and lack the quality guarantees of the Eu...

متن کامل

The k-means-u* algorithm: non-local jumps and greedy retries improve k-means++ clustering

Journal: :CoRR 2017

Bernd Fritzke

We present a new clustering algorithm called k-means-u* which in many cases is able to significantly improve the clusterings found by k-means++, the current de-facto standard for clustering in Euclidean spaces. First we introduce the k-means-u algorithm which starts from a result of k-means++ and attempts to improve it with a sequence of non-local “jumps” alternated by runs of standard k-means....

متن کامل

A Tight Lower Bound Instance for k-means++ in Constant Dimension

2014

Anup Bhattacharya Ragesh Jaiswal Nir Ailon

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. For i > 1, pick a point to be the i center with probability proportional to the square of the Euclidean distance o...

متن کامل

Extending K-Means Clustering Algorithm

2003

Philip Chan

The K-Means algorithm for clustering has the drawback of always maintaining K clusters. This leads to ineffective handling of noisy data and outliers. Noisy data is defined as having little similarity with the closest cluster’s centroid. In K-Means a noisy data item is placed in the most similar cluster, despite this similarity is low relative to the similarity of other data items in the same c...

متن کامل