تحلیل k means

حل مسائل خوشه بندی با استفاده از بهینه سازی شبیه سازی حرارتی

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه شیراز - دانشکده علوم 1388

زهره السادات ناظمی, کورش زیارتی, محمدباقر احمدی, عبدالعزیز عبدالهی,

خوشه بندی فرایندی است که در طی آن مجموعه ای از نمونه ها به خوشه هایی تقسیم می شوند که اعضای هرخوشه بیشترین شباهت را به یکدیگر داشته باشند و خوشه های مختلف با یکدیگر بیشترین تفاوت را داشته باشند. خوشه بندی یکی از تکنیک های داده کاوی و آنالیز داده متعارف می باشد. درخوشه بندی داده ها، در مسائل با اندازه داده بزگتر رسیدن به حل بهینه مشکل تر می باشد و در نتیجه مدت زمان لازم برای رسیدت به حل های قابل...

15 صفحه اول

Analysis of k-Means++ for Separable Data

2012

Ragesh Jaiswal Nitin Garg

k-means++ [5] seeding procedure is a simple sampling based algorithm that is used to quickly find k centers which may then be used to start the Lloyd’s method. There has been some progress recently on understanding this sampling algorithm. Ostrovsky et al. [10] showed that if the data satisfies the separation condition that ∆k−1(P ) ∆k(P ) ≥ c (∆i(P ) is the optimal cost w.r.t. i centers, c > 1...

متن کامل

Nyström Method with Kernel K-means++ Samples as Landmarks

2017

Dino Oglic Thomas Gärtner

We investigate, theoretically and empirically, the effectiveness of kernel K-means++ samples as landmarks in the Nyström method for low-rank approximation of kernel matrices. Previous empirical studies (Zhang et al., 2008; Kumar et al., 2012) observe that the landmarks obtained using (kernel) K-means clustering define a good lowrank approximation of kernel matrices. However, the existing work d...

متن کامل

k-means++ under Approximation Stability

Journal: :Theor. Comput. Sci. 2013

Manu Agarwal Ragesh Jaiswal Arindam Pal

The Lloyd’s algorithm, also known as the k-means algorithm, is one of the most popular algorithms for solving the k-means clustering problem in practice. However, it does not give any performance guarantees. This means that there are datasets on which this algorithm can behave very badly. One reason for poor performance on certain datasets is bad initialization. The following simple sampling ba...

متن کامل

A Fast Approximation Scheme for Low-Dimensional k-Means

2018

Vincent Cohen-Addad

We consider the popular k-means problem in d-dimensional Euclidean space. Recently Friggstad, Rezapour, Salavatipour [FOCS’16] and Cohen-Addad, Klein, Mathieu [FOCS’16] showed that the standard local search algorithm yields a p1`εq-approximation in time pn ̈kq Opdq , giving the first polynomialtime approximation scheme for the problem in low-dimensional Euclidean space. While local search achiev...

متن کامل

Unsupervised Learning of Acoustic Events Using Dynamic Time Warping and Hierarchical K-Means++ Clustering

2011

Joerg Schmalenstroeer Markus Bartek Reinhold Häb-Umbach

In this paper we propose to jointly consider Segmental Dynamic Time Warping and distance clustering for the unsupervised learning of acoustic events. As a result, the computational complexity increases only linearly with the dababase size compared to a quadratic increase in a sequential setup, where all pairwise SDTW distances between segments are computed prior to clustering. Further, we discu...

متن کامل

Comparative Study of k-means and k-Means++ Clustering Algorithms on Crime Domain

Journal: :JCS 2014

Bashar Aubaidan Masnizah Mohd Mohammed Albared

This study presents the results of an experimental study of two document clustering techniques which are kmeans and k-means++. In particular, we compare the two main approaches in crime document clustering. The drawback of k-means is that the user needs to define the centroid point. This becomes more critical when dealing with document clustering because each center point represented by a word ...

متن کامل

Unsupervised learning approach to automation of hammering test using topological information

2017

Jun Younes Louhi Kasahara Hiromitsu Fujii Atsushi Yamashita Hajime Asama

In this paper we present an online unsupervised method based on clustering to find defects in concrete structures using hammering. First, the initial dataset of sound samples is roughly clustered using the k-means algorithm with the k-means++ seeding procedure in order to find the cluster best representative of the structure. Then the regular model for the hammering sound, the centroid of this ...

متن کامل

An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application

Journal: :Appl. Soft Comput. 2012

Fouad Khan

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed to overcome this problem and has been shown to have better accuracy and computational efficiency than k-means. In many clustering problems though –such as w...

متن کامل

Classification of Cerebral Infarction Data Using K-Means and Kernel K-Means

Journal: :Journal of Physics: Conference Series 2021

متن کامل