kmeans clustering

Review of "Introduction to clustering large and high-dimensional data" by J. Kogan

Journal: :Computer Science Review 2008

Dieter Mitsche

Roughly speaking, clustering is a data analysis task to group a set of items into different categories so that items within one category are similar and items between different categories are dissimilar, where similar and dissimilar depend on the definition of distance between items. Although known for many decades, recently clustering has gained a lot of importance due to the exponential growt...

متن کامل

Gene Expression Analysis Using Fuzzy K-Means Clustering

2003

Chinatsu Arima Taizo Hanai Masahiro Okamoto

The recent advances of array technologies have made it possible to monitor huge amount of genes expression data. Clustering, for example, hierarchical clustering, self-organizing maps (SOM), kmeans clustering, has become important analysis for such gene expression data. We have applied the Fuzzy adaptive resonance theory (Fuzzy ART) [5] to the gene clustering of DNA microarray data and the clus...

متن کامل

Efficient Approximation for Large-Scale Kernel Clustering Analysis

2014

Keng-Pei Lin Yu-Chen Yang

Kernel k-means is useful for performing clustering on nonlinearly separable data. The kernel k-means is hard to scale to large data due to the quadratic complexity. In this paper, we propose an approach which utilizes the low-dimensional feature approximation of the Gaussian kernel function to capitalize a fast linear k-means solver to perform the nonlinear kernel k-means. This approach takes a...

متن کامل

Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information

2002

Sabine Schulte im Walde Chris Brew

The paper describes the application of kMeans, a standard clustering technique, to the task of inducing semantic classes for German verbs. Using probability distributions over verb subcategorisation frames, we obtained an intuitively plausible clustering of 57 verbs into 14 classes. The automatic clustering was evaluated against independently motivated, handconstructed semantic verb classes. A ...

متن کامل

New methods for the initialisation of clusters

Journal: :Pattern Recognition Letters 1996

Mohd Belal Al-Daoud Stuart A. Roberts

One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique are dependent on the initialisation of cluster centres. In this article, two initialisation methods are developed. These methods are particularly suited to problems involving very large data sets. The methods have been applied to di erent data sets and good results are obtained.

متن کامل

Generalization Bounds for K-Dimensional Coding Schemes in Hilbert Spaces

2008

Andreas Maurer Massimiliano Pontil

We give a bound on the expected reconstruction error for a general coding method where data in a Hilbert space are represented by finite dimensional coding vectors. The result can be specialized to Kmeans clustering, nonnegative matrix factorization and the sparse coding techniques introduced by Olshausen and Field.

متن کامل

Penerapan K-Means Clustering Untuk Pengelompokan Penyebaran Demam Berdarah Dengue (DBD) Di Kabupaten Deli Serdang

Journal: :Terapan Informatika Nusantara 2022

DBD is a disease that spread rapidly. Usually if there an area affected by dengue fever, it likely to other people in the area. Due large number of sufferers, so much data collected and processing needs be done on these data, such as grouping sufferers with aim focusing vector control areas are vulnerable DBD. The will main priority carry out socialization related handling Data mining series pr...

متن کامل

A Comparison of Two Novel Algorithms for Clustering Web Documents

2003

Adam Schenker Mark Last Horst Bunke Abraham Kandel

In this paper we investigate the clustering of web document collections using two variants of the popular kmeans clustering algorithm. The first variant is the global k-means method, which computes “good” initial cluster centers deterministically rather than relying on random initialization. The second variant allows for the use of graphs as fundamental representations of data items instead of ...

متن کامل

Review of Existing Methods for Finding Initial Clusters in K-means Algorithm

2013

Harmanpreet Singh Kamaljit Kaur

Clustering is one of the Data Mining tasks that can be used to cluster or group objects on the basis of their nearness to the central value. It has found many applications in the field of business, image processing, medical etc. K Means is one the method of clustering which is used widely because it is simple and efficient. The output of the K Means depends upon the chosen central values for cl...

متن کامل

An Enhanced Spectral Clustering for Overlapping Data in Multiple Task Clustering

2016

R. Renukadevi S. Meenakshi

Clustering is one of the most widely used approaches for exploratory data analysis in data mining. In large scale data sources, multitask clustering is an important research work to handle overlapping data, negative and non-negative values among clustering of multiple tasks which is used to improve the learning relationship among related tasks and sharing of information across the tasks. Recent...

متن کامل