A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views
نویسندگان
چکیده
Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multi-faceted and can be grouped and interpreted in many different ways. Moreover, for high-dimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view.
منابع مشابه
Variational Inference for Nonparametric Multiple Clustering
Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multi-faceted and can be grouped and interpreted in many different ways, especially for high dimensional data, where feature selection is typically needed. Moreover, diff...
متن کاملMultiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which sc...
متن کاملGender-based Differences in Associations between Attitude and Self-esteem with Smoking Behavior among Adolescents: A Secondary Analysis Applying Bayesian Nonparametric Functional Latent Variable Model
Background: Different patterns of gender-based relationships between attitude toward smoking and self-esteem with smoking behavior have reported. However, such associations may be much more complex than a simply supposed linear relationship. We aimed to propose a method of providing hand details on the total and gender-based scenarios of the relationships between attitude toward smoking and sel...
متن کاملEfficient Bayesian Methods for Clustering
One of the most important goals of unsupervised learning is to discover meaningful clusters in data. Clustering algorithms strive to discover groups, or clusters, of data points which belong together because they are in some way similar. The research presented in this thesis focuses on using Bayesian statistical techniques to cluster data. We take a model-based Bayesian approach to defining a c...
متن کاملDetecting Overlapping Communities in Social Networks using Deep Learning
In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012