نتایج جستجو برای: high dimensional clustering

تعداد نتایج: 2463052  

2011
Jilles Vreeken Arthur Zimek

While subspace clustering emerged as an application of pattern mining and some of its early advances have probably been inspired by developments in pattern mining, over the years both elds progressed rather independently. In this paper, we identify a number of recent developments in pattern mining that are likely to be applicable to alleviate or solve current problems in subspace clustering and...

2012
Zoraida Callejas Carrión David Griol Klaus-Peter Engelbrecht

The assessment of user simulators in terms of their similarity with real users implies processing and interpreting large dialogue corpora, for which many interaction parameters can be considered. In this setting, the high dimensionality of the data makes it difficult to compare the dialogues as it is not always appropriate to consider all features equally in order to carry out meaningful interp...

2015
Chao Qu Huan Xu

This paper considers the subspace clustering problem where the data contains irrelevant or corrupted features. We propose a method termed “robust Dantzig selector” which can successfully identify the clustering structure even with the presence of irrelevant features. The idea is simple yet powerful: we replace the inner product by its robust counterpart, which is insensitive to the irrelevant f...

Journal: :CoRR 2017
Shiva Alemzadeh Tommy Hielscher Uli Niemann Lena Cibulski Till Ittermann Henry Völzke Myra Spiliopoulou Bernhard Preim

Epidemiology aims at identifying subpopulations of cohort participants that share common characteristics (e.g. alcohol consumption) to explain risk factors of diseases in cohort study data. These data contain information about the participants’ health status gathered from questionnaires, medical examinations, and image acquisition. Due to the growing volume and heterogeneity of epidemiological ...

2002
Kurt Stockinger

Bitmap indices have gained wide acceptance in data warehouse applications and are an efficient access method for querying large amounts of read-only data. The main trend in bitmap index research focuses on typical business applications based on discrete attribute values. However, scientific data that is mostly characterised by non-discrete attributes cannot be queried efficiently by currently s...

2007
Peter M. Bentler

Based on the univariate t-statistic from an invariant representation of multi-variate data, we propose a new quantile-quantile (Q-Q) plot to detect non-multinormality in high-dimensional data analysis. Acceptance regions for the Q-Q plot are provided by the theory of quantile processes. Using the acceptance regions, we perform a Monte Carlo study on the power of the Q-Q plot. It turns out that ...

Journal: :Computers & Graphics 2013
Jie Zhang Junjie Cao Xiuping Liu Jun Wang Jian Liu Xiquan Shi

In this paper, we present a robust normal estimation algorithm based on the low-rank subspace clustering technique. The main idea is based on the observation that compared with the points around sharp features, it is relatively easier to obtain accurate normals for the points within smooth regions. The points around sharp features and smooth regions are identified by covariance analysis of thei...

Journal: :Pattern Recognition 2017
Lunke Fei Yong Xu Xiaozhao Fang Jian Yang

The graph based Semi-supervised Subspace Learning (SSL) methods treat both labeled and unlabeled data as nodes in a graph, and then instantiate edges among these nodes by weighting the affinity between the corresponding pairs of samples. Constructing a good graph to discover the intrinsic structures of the data is critical for these SSL tasks such as subspace clustering and classification. The ...

Journal: :Computational Statistics & Data Analysis 2014
Charles Bouveyron Camille Brunet

Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-param...

2016
Matthew Chalk Olivier Marre Gasper Tkacik

In many applications, it is desirable to extract only the relevant aspects of data. A principled way to do this is the information bottleneck (IB) method, where one seeks a code that maximizes information about a ‘relevance’ variable, Y , while constraining the information encoded about the original data, X . Unfortunately however, the IB method is computationally demanding when data are high-d...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید