نتایج جستجو برای: k means cluster

تعداد نتایج: 880962  

Journal: :CoRR 2017
Olivier Bachem Mario Lucic S. Hamed Hassani Andreas Krause

Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. As such, they are a critical component to empirical risk minimization. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. In our main application, this allows us to ob...

Journal: :CoRR 2009
Kamalika Chaudhuri Sanjoy Dasgupta Andrea Vattani

One of the most popular algorithms for clustering in Euclidean space is the k-means algorithm; k-means is difficult to analyze mathematically, and few theoretical guarantees are known about it, particularly when the data is well-clustered. In this paper, we attempt to fill this gap in the literature by analyzing the behavior of k-means on well-clustered data. In particular, we study the case wh...

Journal: :Computer Networks 2008
Stenio F. L. Fernandes Carlos Alberto Kamienski Judith Kelner Dênio Mariz Djamel Fawzi Hadj Sadok

This work explores the use of statistical techniques, namely stratified sampling and cluster analysis, as powerful tools for deriving traffic properties at the flow level. Our results show that the adequate selection of samples leads to significant improvements allowing further important statistical analysis. Although stratified sampling is a well-known technique, the way we classify the data p...

2002
Jon M. Kleinberg

Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: f...

2015
Chee Seng Chong Jeesun Kim Chris Davis

It has been claimed that tone language speakers use less F0 related cues in the production of verbal expressions of emotions. This is because F0 is used in the production of lexical tones. This study investigated this claim by examining how F0 and various other acoustic parameters are used in the production of verbal emotion expressions in Cantonese (tone language) compared to English (non-tone...

Journal: :JSW 2014
Yuanzi Xu Qingzhong Li Zhongmin Yan Wei Wang

To analyze topics of a large number of web events, we proposed an event topic analysis approach by topic feature clustering and extended LDA (latent dirichlet allocation) model. The extended LDA model is dimension LDA (DLDA) which integrates topic probability of LDA model. We represent an event as a multi-dimensions vector and use DLDA model to select topic feature words in events. We aggregate...

Journal: :J. Riga Technical University 2011
Arnis Kirshners Arkady Borisov

This article examines several data mining approaches that perform short time series analysis. The basis of the methods is formed by clustering algorithms with or without modifications. The proposed methods implement short time series analysis when the numbers of the observations are not equal and the historical information is short. The inspected approaches are offered for solving complex tasks...

2016
Abdul Sittar Hafiz Rizwan Iqbal Rao Muhammad Adeel Nawab

Author Diarization is a new task introduced in PAN’16, to identify portion(s) of text with in a document written by multiple authors. This paper presents, our proposed approach for author diarization task. Various types of stylistic features which include lexical features, used to uniquely identify an author. Furthermore, to find anomalous text with in a single document, ClustDist method used. ...

Journal: :CoRR 2018
Andrew Lithio Ranjan Maitra

The k-means algorithm is the most popular nonparametric clustering method in use, but cannot generally be applied to data sets with missing observations. The usual practice with such data sets is to either impute the values under an assumption of a missing-at-random mechanism or to ignore the incomplete records, and then to use the desired clustering method. We develop an efficient version of t...

2007
Francisco Martínez-Álvarez Alicia Troncoso Lora José Cristóbal Riquelme Santos Jesús Riquelme Santos

Clustering is used to generate groupings of data from a large dataset, with the intention of representing the behavior of a system as accurately as possible. In this sense, clustering is applied in this work to extract useful information from the electricity price time series. To be precise, two clustering techniques, K-means and Expectation Maximization, have been utilized for the analysis of ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید