نتایج جستجو برای: high dimensional data
تعداد نتایج: 4272118 فیلتر نتایج به سال:
Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(rd) columns per subspace are necessary for SCMD, andNk = O(min{d , d}) are sufficient. We close this gap, showing that Nk = O(rd) is also sufficient. To do this we derive deterministic sampling co...
Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction tec...
Dimensionality reduction methods are very common in the field of high dimensional data analysis, where the classical analysis methods are inadequate. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications to process data warehouses are impractical. It is visible even more when the data is accumulated non-stop. In this paper, an out-of-sam...
In this paper, we present a novel low rank representation (LRR) algorithm for data lying on the manifold of square root densities. Unlike traditional LRR methods which rely on the assumption that the data points are vectors in the Euclidean space, our new algorithm is designed to incorporate the intrinsic geometric structure and geodesic distance of the manifold. Experiments on several computer...
Emerging high-dimensional data mining applications needs to find interesting clusters embeded in arbitrarily aligned subspaces of lower dimensionality. It is difficult to cluster high-dimensional data objects, when they are sparse and skewed. Updations are quite common in dynamic databases and they are usually processed in batch mode. In very large dynamic databases, it is necessary to perform ...
Sparse Subspace Clustering (SSC) has been used extensively for subspace iden-tification tasks due to its theoretical guarantees and relative ease of implemen-tation. However SSC has quadratic computation and memory requirementswith respect to the number of input data points. This burden has prohibitedSSCs use for all but the smallest datasets. To overcome this we propose a n...
We consider the problem of clustering incomplete data drawn from a union of subspaces. Classical subspace clustering methods are not applicable to this problem because the data are incomplete, while classical low-rank matrix completion methods may not be applicable because data in multiple subspaces may not be low rank. This paper proposes and evaluates two new approaches for subspace clusterin...
In order to establish consolidated standards in novel data mining areas, newly proposed algorithms need to be evaluated thoroughly. Many publications compare a new proposition – if at all – with one or two competitors or even with a so called “näıve” ad hoc solution. For the prolific field of subspace clustering, we propose a software framework implementing many prominent algorithms and, thus, ...
In this paper, through a series of specific examples, we illustrate some characteristics encountered in analyzing high dimensional multispectral data. The increased importance of the second order statistics in analyzing high dimensional data is illustrated, as is the shortcoming of classifiers such as the minimum distance classifier which rely on first order variations alone. We also illustrate...
We present a novel approach to report approximate as well as exact k-closest pairs for sets of high dimensional points, under the L t-metric, t = 1; : : : ; 1. The proposed algorithms are eecient and simple to implement. They all use multiple shifted copies of the data points sorted according to their position along a space lling curve, such as the Peano curve, in a way that allows us to make p...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید