A Multivariate Poisson-Log Normal Mixture Model for Clustering Transcriptome Sequencing Data
نویسندگان
چکیده
High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture modelbased clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. A mixture of multivariate Poisson-Log Normal (MPLN) model is proposed for clustering of highthroughput transcriptome sequencing data. The MPLN model is able to fit a wide range of correlation and overdispersion situations, and is ideal for modeling multivariate count data from RNA sequencing studies. Parameter estimation is carried out via a Markov chain Monte Carlo expectation-maximization algorithm (MCMC-EM), and information criteria are used for model selection.
منابع مشابه
Clustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملCo-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models
MOTIVATION In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis. RESULTS In this work, we focus on the question of clustering DGE profiles ...
متن کاملGene expression Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models
Motivation: In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis. Results: In this work, we focus on the question of clustering DGE profiles ...
متن کاملBayesian paradigm for analysing count data in longitudina studies using Poisson-generalized log-gamma model
In analyzing longitudinal data with counted responses, normal distribution is usually used for distribution of the random efffects. However, in some applications random effects may not be normally distributed. Misspecification of this distribution may cause reduction of efficiency of estimators. In this paper, a generalized log-gamma distribution is used for the random effects which includes th...
متن کاملTuning the Multivariate Poisson Mixture Model for Clustering Supermarket Shoppers
This paper describes a multivariate Poisson mixture model for clustering supermarket shoppers based on their purchase frequency in a set of product categories. The multivariate nature of the model accounts for cross-selling effects that may exist between the purchases made in different product categories. However, because of computational difficulties, most multivariate approaches limit the cov...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017