Bayesian Cluster Analysis Some Extensions to Non-standard Situations
نویسنده
چکیده
The Bayesian approach to cluster analysis is presented. We assume that all data stem from a nite mixture model, where each component corresponds to one cluster and is given by a multivariate normal distribution with unknown mean and variance. The method produces posterior distributions of all cluster parameters and proportions as well as associated cluster probabilities for all objects. We extend this method in several directions to some common but non-standard situations. The rst extension covers the case with a few deviant observations not belonging to one of the normal clusters. An extra component/cluster is created for them, which has a larger variance or a di¤erent distribution, e.g. is uniform over the whole range. The second extension is clustering of longitudinal data. All units are clustered at all time points separately and the movements between time points are modeled by Markov transition matrices. This means that the clustering at one time point will be a¤ected by what happens at the neighbouring time points. The third extension handles datasets with missing data, e.g. item non-response. We impute the missing values iteratively in an extra step of the Gibbs sampler estimation algorithm. The Bayesian inference of mixture models has many advantages over the classical approach. However, it is not without computational di¢ culties. A software package, written in Matlab for Bayesian inference of mixture models is introduced. The programs of the package handle the basic cases of clustering data that are assumed to arise from mixture models of multivariate normal distributions, as well as the non-standard situations. Keywords: Cluster analysis, Clustering, Classi cation, Mixture model, Gaussian, Bayesian inference, MCMC, Gibbs sampler, Deviant group, Longitudinal, Missing data, Multiple imputation c Jessica Franzén ISBN 978-91-7155-645-5 Printed in Sweden by US-AB, Stockholm 2008 Distributor: Department of Statistics, Stockholm University
منابع مشابه
Approximate Bayesian Computation for Distance-Dependent Learning
The distance dependent Chinese restaurant process (ddCRP) and its hierarchical extensions provide a flexible framework for clustering data with temporal, spatial, or other non-exchangeable dependencies. The successful application of these models crucially depends on functions chosen to encode structural dependencies exhibited by the data. Designing such affinity functions is challenging and oft...
متن کاملSome new extensions of Hardy`s inequality
In this study, by a non-negative homogeneous kernel k we prove some extensions of Hardy's inequalityin two and three dimensions
متن کاملWhen Ignorance is Bliss
It is commonly-accepted wisdom that more information is better, and that information should never be ignored. Here we argue, using both a Bayesian and a non-Bayesian analysis, that in some situations you are better off ignoring information if your uncertainty is represented by a set of probability measures. These include situations in which the information is relevant for the prediction task at...
متن کاملA Petri-net based modeling tool, for analysis and evaluation of computer systems
Petri net is one of the most popular methods in modeling and evaluation of concurrent and event-based systems. Different tools have been created to support modeling and simulation of different extensions of Petri net in different applications. Each tool supports some extensions and some features. In this work a Petri net based modeling and evaluation tool is presented that not only supports dif...
متن کاملA Bayesian Approach to Estimate Parameters of a Random Coefficient Transition Binary Logistic Model with Non-monotone Missing Pattern and some Sensitivity Analyses
A transition binary logistic model with random coefficients is proposed to model the unemployment statues of household members in two seasons of spring and summer. Data correspond to the labor force survey performed by Statistical Center of Iran in 2006. This model is introduced to take into account two kinds of correlation in the data one due to the longitudinal nature o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008