means cluster

On Clustering Time Series Using Euclidean Distance and Pearson Correlation

Journal: :CoRR 2016

Michael R. Berthold Frank Höppner

For time series comparisons, it has often been observed that z-score normalized Euclidean distances far outperform the unnormalized variant. In this paper we show that a z-score normalized, squared Euclidean Distance is, in fact, equal to a distance based on Pearson Correlation. This has profound impact on many distance-based classification or clustering methods. In addition to this theoretical...

متن کامل

A data placement strategy in scientific cloud workflows

Journal: :Future Generation Comp. Syst. 2010

Dong Yuan Yun Yang Xiao Liu Jinjun Chen

In scientific cloud workflows, large amounts of application data need to be stored in distributed data centres. To effectively store these data, a data manager must intelligently select data centres in which these data will reside. This is, however, not the case for data which must have a fixed location. When one task needs several datasets located in different data centres, the movement of lar...

متن کامل

Flow Detection Based on Traffic Video Image Processing

Journal: :Journal of Multimedia 2013

Peng Shen

Because in the traffic video image processing, the background image gotten from background modeling by traditional k-means clustering algorithm shows a lot of noises, thus the improvement of k-means clustering algorithm is proposed, and has been applied to the vehicle flow detection of traffic video image. By analyzing the vehicle detection method and comparing the flow detection algorithm, the...

متن کامل

A Novel Approach to Clustering of Proteins

2011

Appa Rao Vijay Kumar

Data Analysis plays an indispensable role for understanding various phenomena. Clustering algorithms are a class of important tools for data analysis. K-means cluster analysis is considered to cluster protein variates across 3 species using SPSS 16.0. In this Paper we describe an approach to kmeans cluster analysis which grouped the sample data of the three species under study into four apriori...

متن کامل

Improved Color Barycenter Model for Road-Sign Detection

2013

Qieshi Zhang Sei-ichiro Kamata

This paper proposes an improved color barycenter model (CBM) for road sign detection. The previous version of CBM can find out the colors of road-sign (RS), but its accuracy is not high enough for magenta and blue region segmentation. The improved CBM extends the barycenter distribution to cylinder coordinate and takes the number of colors in every point into account. Then the K-means clusterin...

متن کامل

Outlier Detection using Improved Genetic K-means

Journal: :CoRR 2011

M. H. Marghny Ahmed I. Taloba

The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make more reliable clustering. In this article, we present an algorithm that provides outlier detection and data clustering simul...

متن کامل

Uniform Deviation Bounds for Unbounded Loss Functions like k-Means

Journal: :CoRR 2017

Olivier Bachem Mario Lucic S. Hamed Hassani Andreas Krause

Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. As such, they are a critical component to empirical risk minimization. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. In our main application, this allows us to ob...

متن کامل

Learning Mixtures of Gaussians using the k-means Algorithm

Journal: :CoRR 2009

Kamalika Chaudhuri Sanjoy Dasgupta Andrea Vattani

One of the most popular algorithms for clustering in Euclidean space is the k-means algorithm; k-means is difficult to analyze mathematically, and few theoretical guarantees are known about it, particularly when the data is well-clustered. In this paper, we attempt to fill this gap in the literature by analyzing the behavior of k-means on well-clustered data. In particular, we study the case wh...

متن کامل

A stratified traffic sampling methodology for seeing the big picture

Journal: :Computer Networks 2008

Stenio F. L. Fernandes Carlos Alberto Kamienski Judith Kelner Dênio Mariz Djamel Fawzi Hadj Sadok

This work explores the use of statistical techniques, namely stratified sampling and cluster analysis, as powerful tools for deriving traffic properties at the flow level. Our results show that the adequate selection of samples leads to significant improvements allowing further important statistical analysis. Although stratified sampling is a well-known technique, the way we classify the data p...

متن کامل

An Impossibility Theorem for Clustering

2002

Jon M. Kleinberg

Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: f...

متن کامل