Hiding outliers in high-dimensional data spaces

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SubRank: Ranking Local Outliers in Projections of High-Dimensional Spaces

Outlier mining has become an increasingly urgent issue in the KDD process, since it may be the case that finding exceptional events is more interesting than searching for common patterns. These outliers are most relevant to be found for instance in fraud detection processes. Unfortunately, existing approaches do not take into account that increasing dimensionality leads to a novel understanding...

متن کامل

Detecting Projected Outliers in High-Dimensional Data Streams

In this paper, we study the problem of projected outlier detection in high dimensional data streams and propose a new technique, called Stream Projected Ouliter deTector (SPOT), to identify outliers embedded in subspaces. Sparse Subspace Template (SST), a set of subspaces obtained by unsupervised and/or supervised learning processes, is constructed in SPOT to detect projected outliers effective...

متن کامل

On high dimensional data spaces

Data mining applications usually encounter high dimensional data spaces. Most of these dimensions contain ‘uninteresting’ data, which would not only be of little value in terms of discovery of any rules or patterns, but have been shown to mislead some classification algorithms. Since, the computational effort increases very significantly (usually exponentially) in the presence of a large number...

متن کامل

Similarity Search in High-Dimensional Data Spaces

This paper summarizes analytical and experimental results for the nearest neighbor similarity search problem in high-dimensional vector spaces using some kind of space-or data-partitioning scheme. Under the assumptions of uniformity and independence of data, we are able to formally show and to demonstrate that conventional approaches to the nearest neighbor problem degenerate if the dimensional...

متن کامل

Sparse PCA for High-Dimensional Data With Outliers

A new sparse PCA algorithm is presented which is robust against outliers. The approach is based on the ROBPCA algorithm which generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately find...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Data Science and Analytics

سال: 2017

ISSN: 2364-415X,2364-4168

DOI: 10.1007/s41060-017-0068-8