Hiding outliers in high-dimensional data spaces
نویسندگان
چکیده
منابع مشابه
SubRank: Ranking Local Outliers in Projections of High-Dimensional Spaces
Outlier mining has become an increasingly urgent issue in the KDD process, since it may be the case that finding exceptional events is more interesting than searching for common patterns. These outliers are most relevant to be found for instance in fraud detection processes. Unfortunately, existing approaches do not take into account that increasing dimensionality leads to a novel understanding...
متن کاملDetecting Projected Outliers in High-Dimensional Data Streams
In this paper, we study the problem of projected outlier detection in high dimensional data streams and propose a new technique, called Stream Projected Ouliter deTector (SPOT), to identify outliers embedded in subspaces. Sparse Subspace Template (SST), a set of subspaces obtained by unsupervised and/or supervised learning processes, is constructed in SPOT to detect projected outliers effective...
متن کاملOn high dimensional data spaces
Data mining applications usually encounter high dimensional data spaces. Most of these dimensions contain ‘uninteresting’ data, which would not only be of little value in terms of discovery of any rules or patterns, but have been shown to mislead some classification algorithms. Since, the computational effort increases very significantly (usually exponentially) in the presence of a large number...
متن کاملSimilarity Search in High-Dimensional Data Spaces
This paper summarizes analytical and experimental results for the nearest neighbor similarity search problem in high-dimensional vector spaces using some kind of space-or data-partitioning scheme. Under the assumptions of uniformity and independence of data, we are able to formally show and to demonstrate that conventional approaches to the nearest neighbor problem degenerate if the dimensional...
متن کاملSparse PCA for High-Dimensional Data With Outliers
A new sparse PCA algorithm is presented which is robust against outliers. The approach is based on the ROBPCA algorithm which generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately find...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Data Science and Analytics
سال: 2017
ISSN: 2364-415X,2364-4168
DOI: 10.1007/s41060-017-0068-8