نتایج جستجو برای: high dimensional clustering
تعداد نتایج: 2463052 فیلتر نتایج به سال:
While subspace clustering emerged as an application of pattern mining and some of its early advances have probably been inspired by developments in pattern mining, over the years both elds progressed rather independently. In this paper, we identify a number of recent developments in pattern mining that are likely to be applicable to alleviate or solve current problems in subspace clustering and...
The assessment of user simulators in terms of their similarity with real users implies processing and interpreting large dialogue corpora, for which many interaction parameters can be considered. In this setting, the high dimensionality of the data makes it difficult to compare the dialogues as it is not always appropriate to consider all features equally in order to carry out meaningful interp...
This paper considers the subspace clustering problem where the data contains irrelevant or corrupted features. We propose a method termed “robust Dantzig selector” which can successfully identify the clustering structure even with the presence of irrelevant features. The idea is simple yet powerful: we replace the inner product by its robust counterpart, which is insensitive to the irrelevant f...
Epidemiology aims at identifying subpopulations of cohort participants that share common characteristics (e.g. alcohol consumption) to explain risk factors of diseases in cohort study data. These data contain information about the participants’ health status gathered from questionnaires, medical examinations, and image acquisition. Due to the growing volume and heterogeneity of epidemiological ...
Bitmap indices have gained wide acceptance in data warehouse applications and are an efficient access method for querying large amounts of read-only data. The main trend in bitmap index research focuses on typical business applications based on discrete attribute values. However, scientific data that is mostly characterised by non-discrete attributes cannot be queried efficiently by currently s...
Based on the univariate t-statistic from an invariant representation of multi-variate data, we propose a new quantile-quantile (Q-Q) plot to detect non-multinormality in high-dimensional data analysis. Acceptance regions for the Q-Q plot are provided by the theory of quantile processes. Using the acceptance regions, we perform a Monte Carlo study on the power of the Q-Q plot. It turns out that ...
In this paper, we present a robust normal estimation algorithm based on the low-rank subspace clustering technique. The main idea is based on the observation that compared with the points around sharp features, it is relatively easier to obtain accurate normals for the points within smooth regions. The points around sharp features and smooth regions are identified by covariance analysis of thei...
The graph based Semi-supervised Subspace Learning (SSL) methods treat both labeled and unlabeled data as nodes in a graph, and then instantiate edges among these nodes by weighting the affinity between the corresponding pairs of samples. Constructing a good graph to discover the intrinsic structures of the data is critical for these SSL tasks such as subspace clustering and classification. The ...
Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-param...
In many applications, it is desirable to extract only the relevant aspects of data. A principled way to do this is the information bottleneck (IB) method, where one seeks a code that maximizes information about a ‘relevance’ variable, Y , while constraining the information encoded about the original data, X . Unfortunately however, the IB method is computationally demanding when data are high-d...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید