k mean clustering algorithm

Hierarchical clustering algorithm for categorical data using a probabilistic rough set model

Journal: :Knowl.-Based Syst. 2014

Min Li Shaobo Deng Lei Wang Shengzhong Feng Jianping Fan

Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of thi...

متن کامل

Comparative Study of k-means and k-Means++ Clustering Algorithms on Crime Domain

Journal: :JCS 2014

Bashar Aubaidan Masnizah Mohd Mohammed Albared

This study presents the results of an experimental study of two document clustering techniques which are kmeans and k-means++. In particular, we compare the two main approaches in crime document clustering. The drawback of k-means is that the user needs to define the centroid point. This becomes more critical when dealing with document clustering because each center point represented by a word ...

متن کامل

A Novel And Improved Technique For Clustering Uncertain Data

2015

Vandana Dubey

Clustering on uncertain data, one of the essential tasks in data mining. The traditional algorithms like K-Means clustering, UK Means clustering, density based clustering etc, to cluster uncertain data are limited to using geometric distance based similarity measures and cannot capture the difference between uncertain data with their distributions. Such methods cannot handle uncertain objects t...

متن کامل

On the Use of PLDA i-vector Scoring for Clustering Short Segments

2016

Itay Salmun Irit Opher Itshak Lapidot

This paper extends upon a previous work using Mean Shift algorithm to perform speaker clustering on i-vectors generated from short speech segments. In this paper we examine the effectiveness of probabilistic linear discriminant analysis (PLDA) scoring as the metric of the mean shift clustering algorithm in the presence of different number of speakers. Our proposed method, combined with k-neares...

متن کامل

Data Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach

Journal: Journal of Advances in Computer Research 2011

Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...

متن کامل

Cluster-Seeking James-Stein Estimators

Journal: :IEEE Trans. Information Theory 2018

K. Pavan Srinath Ramji Venkataramanan

This paper considers the problem of estimating a high-dimensional vector of parameters θ ∈ R from a noisy observation. The noise vector is i.i.d. Gaussian with known variance. For a squared-error loss function, the James-Stein (JS) estimator is known to dominate the simple maximum-likelihood (ML) estimator when the dimension n exceeds two. The JS-estimator shrinks the observed vector towards th...

متن کامل

An Hybrid Technique for Data Clustering Using Genetic Algorithm with Particle Swarm Optimization

2015

Dr. Karthikeyan

Data clustering is useful in several areas such as machine learning, data mining, wireless sensor networks and pattern recognition. The most famous clustering approach is K-means which successfully has been utilized in numerous clustering problems, but this algorithm has some limitations such as local optimal convergence and initial point understanding. Clustering is the procedure of grouping o...

متن کامل

An Efficient Approach towards K-Means Clustering Algorithm

2014

Pallavi Purohit Ritesh Joshi

K-Means clustering algorithms are used in various practical applications countless times. Original K-Means algorithm select initial centroids randomly it generates unstable cluster as the value of object in cluster depend on the selection of initial cluster means which is done by random selection of objects. The number of times different selection of initial centroids will give number of differ...

متن کامل

Reducing the Time Requirement of k-Means Algorithm

2012

Victor Chukwudi Osamor Ezekiel Femi Adebiyi Jelilli Olarenwaju Oyelade Seydou Doumbia

Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space R(d) and an integer k. The problem is to determine a set of k points in R(d), called centers, so as to minimize the mean squared dist...

متن کامل

K-mean Based Clustering and Context Quantization

2005

Mantao Xu Laurence S. Dooley Ales Leonardis

In this thesis, we study the problems of K-means clustering and context quantization. The main task of K-means clustering is to partition the training patterns into k distinct groups or clusters that minimize the mean-square-error (MSE) objective function. But the main difficulty of conventional K-means clustering is that its classification performance is highly susceptible to the initialized s...

متن کامل