high dimensional data

The Information-Theoretic Requirements of Subspace Clustering with Missing Data

2016

Daniel L. Pimentel-Alarcón Robert D. Nowak

Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(rd) columns per subspace are necessary for SCMD, andNk = O(min{d , d}) are sufficient. We close this gap, showing that Nk = O(rd) is also sufficient. To do this we derive deterministic sampling co...

متن کامل

Subspace Clustering of High Dimensional Data

2004

Carlotta Domeniconi Dimitris Papadopoulos Dimitrios Gunopulos Sheng Ma

Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction tec...

متن کامل

PCA-Based Out-of-Sample Extension for Dimensionality Reduction

2013

Yariv Aizenbud Amit Bermanis Amir Averbuch

Dimensionality reduction methods are very common in the field of high dimensional data analysis, where the classical analysis methods are inadequate. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications to process data warehouses are impractical. It is visible even more when the data is accumulated non-stop. In this paper, an out-of-sam...

متن کامل

Low Rank Representation on Riemannian Manifold of Square Root Densities

Journal: :CoRR 2015

Yifan Fu Junbin Gao Xia Hong David Tien

In this paper, we present a novel low rank representation (LRR) algorithm for data lying on the manifold of square root densities. Unlike traditional LRR methods which rely on the assumption that the data points are vectors in the Euclidean space, our new algorithm is designed to incorporate the intrinsic geometric structure and geodesic distance of the manifold. Experiments on several computer...

متن کامل

Dynamic Subspace Clustering for Very Large High-Dimensional Databases

2003

P. Deepa Shenoy K. G. Srinivasa M. P. Mithun K. R. Venugopal Lalit M. Patnaik

Emerging high-dimensional data mining applications needs to find interesting clusters embeded in arbitrarily aligned subspaces of lower dimensionality. It is difficult to cluster high-dimensional data objects, when they are sparse and skewed. Updations are quite common in dynamic databases and they are usually processed in batch mode. In very large dynamic databases, it is necessary to perform ...

متن کامل

Efficient Sparse Subspace Clustering by Nearest Neighbour Filtering

Journal: :CoRR 2017

Stephen Tierney Yi Guo Junbin Gao

Sparse Subspace Clustering (SSC) has been used extensively for subspace iden-tification tasks due to its theoretical guarantees and relative ease of implemen-tation. However SSC has quadratic computation and memory requirementswith respect to the number of input data points. This burden has prohibitedSSCs use for all but the smallest datasets. To overcome this we propose a n...

متن کامل

Sparse Subspace Clustering with Missing Entries

2015

Congyuan Yang Daniel P. Robinson René Vidal

We consider the problem of clustering incomplete data drawn from a union of subspaces. Classical subspace clustering methods are not applicable to this problem because the data are incomplete, while classical low-rank matrix completion methods may not be applicable because data in multiple subspaces may not be low rank. This paper proposes and evaluates two new approaches for subspace clusterin...

متن کامل

ELKI: A Software System for Evaluation of Subspace Clustering Algorithms

2008

Elke Achtert Hans-Peter Kriegel Arthur Zimek

In order to establish consolidated standards in novel data mining areas, newly proposed algorithms need to be evaluated thoroughly. Many publications compare a new proposition – if at all – with one or two competitors or even with a so called “näıve” ad hoc solution. For the prolific field of subspace clustering, we propose a software framework implementing many prominent algorithms and, thus, ...

متن کامل

Analyzing high-dimensional multispectral data

Journal: :IEEE Trans. Geoscience and Remote Sensing 1993

Chulhee Lee David A. Landgrebe

In this paper, through a series of specific examples, we illustrate some characteristics encountered in analyzing high dimensional multispectral data. The increased importance of the second order statistics in analyzing high dimensional data is illustrated, as is the shortcoming of classifiers such as the minimum distance classifier which rely on first order variations alone. We also illustrate...

متن کامل

ciently for High Dimensional Data

2000

Mario A. Lopez Swanwa Liao

We present a novel approach to report approximate as well as exact k-closest pairs for sets of high dimensional points, under the L t-metric, t = 1; : : : ; 1. The proposed algorithms are eecient and simple to implement. They all use multiple shifted copies of the data points sorted according to their position along a space lling curve, such as the Peano curve, in a way that allows us to make p...

متن کامل