Bayesian non-parametric hidden Markov models with applications in genomics

نویسنده

  • G. O. Roberts
چکیده

We propose a flexible non-parametric specification of the emission distribution in hidden Markov models and we introduce a novel methodology for carrying out the computations. Whereas current approaches use a finite mixture model, we argue in favour of an infinite mixture model given by a mixture of Dirichlet processes.The computational framework is based on auxiliary variable representations of the Dirichlet process and consists of a forward–backward Gibbs sampling algorithm of similar complexity to that used in the analysis of parametric hidden Markov models. The algorithm involves analytic marginalizations of latent variables to improve the mixing, facilitated by exchangeability properties of the Dirichlet process that we uncover in the paper. A by-product of this work is an efficient Gibbs sampler for learning Dirichlet process hierarchical models.We test the Monte Carlo algorithm proposed against a wide variety of alternatives and find significant advantages.We also investigate by simulations the sensitivity of the proposed model to prior specification and data-generating mechanisms. We apply our methodology to the analysis of genomic copy number variation. Analysing various real data sets we find significantly more accurate inference compared with state of the art hidden Markov models which use finite mixture emission distributions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probability , Statistics , and Computational Science Niko

In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur ...

متن کامل

Bayesian nonparametric hidden semi-Markov models

There is much interest in the Hierarchical Dirichlet Process Hidden Markov Model (HDPHMM) as a natural Bayesian nonparametric extension of the ubiquitous Hidden Markov Model for learning from sequential and time-series data. However, in many settings the HDP-HMM’s strict Markovian constraints are undesirable, particularly if we wish to learn or encode non-geometric state durations. We can exten...

متن کامل

A Non-Parametric Bayesian Method for Inferring Hidden Causes

We present a non-parametric Bayesian approach to structure learning with hidden causes. Previous Bayesian treatments of this problem define a prior over the number of hidden causes and use algorithms such as reversible jump Markov chain Monte Carlo to move between solutions. In contrast, we assume that the number of hidden causes is unbounded, but only a finite number influence observable varia...

متن کامل

Unsupervised Modeling of Patient-Level Disease Dynamics

To provide insight into patient-level disease dynamics from data collected at irregular time intervals, this work extends applications of semi-parametric clustering for temporal mining. In the semi-parametric clustering framework, Markovian models provide useful parametric assumptions for modeling temporal dynamics, and a non-parametric method is used to cluster the temporal abstractions instea...

متن کامل

Iclr 2017 B Ioacoustic Segmentation by Hierarchical D Irichlet Process Hidden M Arkov Model

Understanding the communication between different animals by analysing their acoustic signals is an important topic in bioacoustics. It can be a powerful tool for the preservation of ecological diversity. We investigate probabilistic models to analyse signals issued from real-world bioacoustic sound scenes. We study a Bayesian non-parametric sequential models based on Hierarchical Dirichlet Pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010