k means cluster

نتایج جستجو برای: k means cluster

تعداد نتایج: 880962 فیلتر نتایج به سال:

Uniform Deviation Bounds for Unbounded Loss Functions like k-Means

Journal: :CoRR 2017

Olivier Bachem Mario Lucic S. Hamed Hassani Andreas Krause

Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. As such, they are a critical component to empirical risk minimization. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. In our main application, this allows us to ob...

متن کامل

Learning Mixtures of Gaussians using the k-means Algorithm

Journal: :CoRR 2009

Kamalika Chaudhuri Sanjoy Dasgupta Andrea Vattani

One of the most popular algorithms for clustering in Euclidean space is the k-means algorithm; k-means is difficult to analyze mathematically, and few theoretical guarantees are known about it, particularly when the data is well-clustered. In this paper, we attempt to fill this gap in the literature by analyzing the behavior of k-means on well-clustered data. In particular, we study the case wh...

متن کامل

A stratified traffic sampling methodology for seeing the big picture

Journal: :Computer Networks 2008

Stenio F. L. Fernandes Carlos Alberto Kamienski Judith Kelner Dênio Mariz Djamel Fawzi Hadj Sadok

This work explores the use of statistical techniques, namely stratified sampling and cluster analysis, as powerful tools for deriving traffic properties at the flow level. Our results show that the adequate selection of samples leads to significant improvements allowing further important statistical analysis. Although stratified sampling is a well-known technique, the way we classify the data p...

متن کامل

An Impossibility Theorem for Clustering

2002

Jon M. Kleinberg

Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: f...

متن کامل

Exploring acoustic differences between Cantonese (tonal) and English (non-tonal) spoken expressions of emotions

2015

Chee Seng Chong Jeesun Kim Chris Davis

It has been claimed that tone language speakers use less F0 related cues in the production of verbal expressions of emotions. This is because F0 is used in the production of lexical tones. This study investigated this claim by examining how F0 and various other acoustic parameters are used in the production of verbal emotion expressions in Cantonese (tone language) compared to English (non-tone...

متن کامل

Web Event Topic Analysis by Topic Feature Clustering and Extended LDA Model

Journal: :JSW 2014

Yuanzi Xu Qingzhong Li Zhongmin Yan Wei Wang

To analyze topics of a large number of web events, we proposed an event topic analysis approach by topic feature clustering and extended LDA (latent dirichlet allocation) model. The extended LDA model is dimension LDA (DLDA) which integrates topic probability of LDA model. We represent an event as a multi-dimensions vector and use DLDA model to select topic feature words in events. We aggregate...

متن کامل

Processing Short Time Series with Data Mining Methods

Journal: :J. Riga Technical University 2011

Arnis Kirshners Arkady Borisov

This article examines several data mining approaches that perform short time series analysis. The basis of the methods is formed by clustering algorithms with or without modifications. The proposed methods implement short time series analysis when the numbers of the observations are not equal and the historical information is short. The inspected approaches are offered for solving complex tasks...

متن کامل

Author Diarization Using Cluster-Distance Approach

2016

Abdul Sittar Hafiz Rizwan Iqbal Rao Muhammad Adeel Nawab

Author Diarization is a new task introduced in PAN’16, to identify portion(s) of text with in a document written by multiple authors. This paper presents, our proposed approach for author diarization task. Various types of stylistic features which include lexical features, used to uniquely identify an author. Furthermore, to find anomalous text with in a single document, ClustDist method used. ...

متن کامل

An efficient k-means-type algorithm for clustering datasets with incomplete records

Journal: :CoRR 2018

Andrew Lithio Ranjan Maitra

The k-means algorithm is the most popular nonparametric clustering method in use, but cannot generally be applied to data sets with missing observations. The usual practice with such data sets is to either impute the values under an assumption of a missing-at-random mechanism or to ignore the incomplete records, and then to use the desired clustering method. We develop an efficient version of t...

متن کامل

Partitioning-Clustering Techniques Applied to the Electricity Price Time Series

2007

Francisco Martínez-Álvarez Alicia Troncoso Lora José Cristóbal Riquelme Santos Jesús Riquelme Santos

Clustering is used to generate groupings of data from a large dataset, with the intention of representing the behavior of a system as accurately as possible. In this sense, clustering is applied in this work to extract useful information from the electricity price time series. To be precise, two clustering techniques, K-means and Expectation Maximization, have been utilized for the analysis of ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید