high dimensional clustering

The Search Problem in Mixture Models

Journal: :CoRR 2016

Avik Ray Joe Neeman Sujay Sanghavi Sanjay Shakkottai

We consider the task of learning the parameters of a single component of a mixture model, for the case when we are given side information about that component; we call this the “search problem” in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributi...

متن کامل

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms

2012

T. Vijayakumar

Subspace clustering is an emerging task that aims at detecting clusters in entrenched in subspaces. Recent approaches fail to reduce results to relevant subspace clusters. Their results are typically highly redundant and lack the fact of considering the critical problem, “the density divergence problem,” in discovering the clusters, where they utilize an absolute density value as the density th...

متن کامل

From Data to the Physics Using Ultrametrics: New Results in High Dimensional Data Analysis

2005

Fionn Murtagh

We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, in time series signals, and in other areas. We conc...

متن کامل

Automatic Subspace Clustering of High Dimensional Data for DataMining

1998

Rakesh Agrawal Johannes Gehrke Dimitrios Gunopulos Prabhakar Raghavan

Data mining applications place special requirements on clustering algorithms including: the ability to nd clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisses each of these require...

متن کامل

Combination of difference subspace and Oppurtunistic Clustering on High dimensional Data

Journal: :JCS 2014

M. Ravichandran A. Shanmugam

Clustering is recognized as sigificant technique for analysing data and concentric effort has been taken in different domains comprises of recognition of pattern, statistical analysis and data mining for decades. Subspace clustering is developed from the group of cluster objects from all subspaces of a dataset. During clustering of objects involing higher dimension, the accuracy and effectivene...

متن کامل

The Role Of Hubness in High-dimensional Data Analysis

Journal: :Informatica (Slovenia) 2014

Nenad Tomasev

Machine learning in intrinsically high-dimensional data is known to be challenging and this is usually referred to as the curse of dimensionality. Designing machine learning methods that perform well in many dimensions is critical, since highdimensional data arises often in practical applications and typical examples include textual, image and multimedia feature representations, as well as time...

متن کامل

Clustering high dimensional data using subspace and projected clustering algorithms

Journal: :CoRR 2010

Rahmat Widia Sembiring Jasni Mohamad Zain Abdullah Embong

Problem statement: Clustering has a number of techniques that have been developed in statistics, pattern recognition, data mining, and other fields. Subspace clustering enumerates clusters of objects in all subspaces of a dataset. It tends to produce many over lapping clusters. Approach: Subspace clustering and projected clustering are research areas for clustering in high dimensional spaces. I...

متن کامل

Low Rank Representation on Grassmann Manifolds: An Extrinsic Perspective

Journal: :CoRR 2015

Boyue Wang Yongli Hu Junbin Gao Yanfeng Sun Baocai Yin

Many computer vision algorithms employ subspace models to represent data. The Low-rank representation (LRR) has been successfully applied in subspace clustering for which data are clustered according to their subspace structures. The possibility of extending LRR on Grassmann manifold is explored in this paper. Rather than directly embedding Grassmann manifold into a symmetric matrix space, an e...

متن کامل

An adaptive classifier design for high-dimensional data analysis with a limited training data set

Journal: :IEEE Trans. Geoscience and Remote Sensing 2001

Qiong Jackson David A. Landgrebe

In this paper, we propose a self-learning and self-improving adaptive classifier to mitigate the problem of small training sample size that can severely affect the recognition accuracy of classifiers when the dimensionality of the multispectral data is high. This proposed adaptive classifier utilizes classified samples (referred as semilabeled samples) in addition to original training samples i...

متن کامل

Rank/Norm Regularization with Closed-Form Solutions: Application to Subspace Clustering

2011

Yaoliang Yu Dale Schuurmans

When data is sampled from an unknown subspace, principal component analysis (PCA) provides an effective way to estimate the subspace and hence reduce the dimension of the data. At the heart of PCA is the EckartYoung-Mirsky theorem, which characterizes the best rank k approximation of a matrix. In this paper, we prove a generalization of the Eckart-Young-Mirsky theorem under all unitarily invari...

متن کامل