high dimensional clustering

When Pattern met Subspace Cluster a Relationship Story

2011

Jilles Vreeken Arthur Zimek

While subspace clustering emerged as an application of pattern mining and some of its early advances have probably been inspired by developments in pattern mining, over the years both elds progressed rather independently. In this paper, we identify a number of recent developments in pattern mining that are likely to be applicable to alleviate or solve current problems in subspace clustering and...

متن کامل

Assessment of user simulators for spoken dialogue systems by means of subspace multidimensional clustering

2012

Zoraida Callejas Carrión David Griol Klaus-Peter Engelbrecht

The assessment of user simulators in terms of their similarity with real users implies processing and interpreting large dialogue corpora, for which many interaction parameters can be considered. In this setting, the high dimensionality of the data makes it difficult to compare the dialogues as it is not always appropriate to consider all features equally in order to carry out meaningful interp...

متن کامل

Subspace Clustering with Irrelevant Features via Robust Dantzig Selector

2015

Chao Qu Huan Xu

This paper considers the subspace clustering problem where the data contains irrelevant or corrupted features. We propose a method termed “robust Dantzig selector” which can successfully identify the clustering structure even with the presence of irrelevant features. The idea is simple yet powerful: we replace the inner product by its robust counterpart, which is insensitive to the irrelevant f...

متن کامل

Visual Subpopulation Discovery and Validation in Cohort Study Data

Journal: :CoRR 2017

Shiva Alemzadeh Tommy Hielscher Uli Niemann Lena Cibulski Till Ittermann Henry Völzke Myra Spiliopoulou Bernhard Preim

Epidemiology aims at identifying subpopulations of cohort participants that share common characteristics (e.g. alcohol consumption) to explain risk factors of diseases in cohort study data. These data contain information about the participants’ health status gathered from questionnaires, medical examinations, and image acquisition. Due to the growing volume and heterogeneity of epidemiological ...

متن کامل

Bitmap Indices for Speeding Up High-Dimensional Data Analysis

2002

Kurt Stockinger

Bitmap indices have gained wide acceptance in data warehouse applications and are an efficient access method for querying large amounts of read-only data. The main trend in bitmap index research focuses on typical business applications based on discrete attribute values. However, scientific data that is mostly characterised by non-discrete attributes cannot be queried efficiently by currently s...

متن کامل

A T-distribution Plot to Detect Non-multinormality

2007

Peter M. Bentler

Based on the univariate t-statistic from an invariant representation of multi-variate data, we propose a new quantile-quantile (Q-Q) plot to detect non-multinormality in high-dimensional data analysis. Acceptance regions for the Q-Q plot are provided by the theory of quantile processes. Using the acceptance regions, we perform a Monte Carlo study on the power of the Q-Q plot. It turns out that ...

متن کامل

Point cloud normal estimation via low-rank subspace clustering

Journal: :Computers & Graphics 2013

Jie Zhang Junjie Cao Xiuping Liu Jun Wang Jian Liu Xiquan Shi

In this paper, we present a robust normal estimation algorithm based on the low-rank subspace clustering technique. The main idea is based on the observation that compared with the points around sharp features, it is relatively easier to obtain accurate normals for the points within smooth regions. The points around sharp features and smooth regions are identified by covariance analysis of thei...

متن کامل

Low rank representation with adaptive distance penalty for semi-supervised subspace classification

Journal: :Pattern Recognition 2017

Lunke Fei Yong Xu Xiaozhao Fang Jian Yang

The graph based Semi-supervised Subspace Learning (SSL) methods treat both labeled and unlabeled data as nodes in a graph, and then instantiate edges among these nodes by weighting the affinity between the corresponding pairs of samples. Constructing a good graph to discover the intrinsic structures of the data is critical for these SSL tasks such as subspace clustering and classification. The ...

متن کامل

Model-based clustering of high-dimensional data: A review

Journal: :Computational Statistics & Data Analysis 2014

Charles Bouveyron Camille Brunet

Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-param...

متن کامل

Relevant sparse codes with variational information bottleneck

2016

Matthew Chalk Olivier Marre Gasper Tkacik

In many applications, it is desirable to extract only the relevant aspects of data. A principled way to do this is the information bottleneck (IB) method, where one seeks a code that maximizes information about a ‘relevance’ variable, Y , while constraining the information encoded about the original data, X . Unfortunately however, the IB method is computationally demanding when data are high-d...

متن کامل