A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data
نویسندگان
چکیده
A main application for mRNA sequencing (mRNAseq) is determining lists of differentially-expressed genes (DEGs) between two or more conditions. Several software packages exist to produce DEGs from mRNAseq data, but they typically yield different DEGs, sometimes markedly so. The underlying probability model used to describe mRNAseq data is central to deriving DEGs, and not surprisingly most softwares use different models and assumptions to analyze mRNAseq data. Here, we propose a mechanistic justification to model mRNAseq as a binomial process, with data from technical replicates given by a binomial distribution, and data from biological replicates well-described by a beta-binomial distribution. We demonstrate good agreement of this model with two large datasets. We show that an emergent feature of the beta-binomial distribution, given parameter regimes typical for mRNAseq experiments, is the well-known quadratic polynomial scaling of variance with the mean. The so-called dispersion parameter controls this scaling, and our analysis suggests that the dispersion parameter is a continually decreasing function of the mean, as opposed to current approaches that impose an asymptotic value to the dispersion parameter at moderate mean read counts. We show how this leads to current approaches overestimating variance for moderately to highly expressed genes, which inflates false negative rates. Describing mRNAseq data with a beta-binomial distribution thus may be preferred since its parameters are relatable to the mechanistic underpinnings of the technique and may improve the consistency of DEG analysis across softwares, particularly for moderately to highly expressed genes.
منابع مشابه
An Alternative to the Beta-Binomial Distribution with Application in Developmental Toxicology
The beta-binomial distribution is resulted when the probability of success per trial in the binomial distribution varies in successive trials and the mixing distribution is from the beta family. For experiments with binary outcomes, often it may happen that observations exhibit some extra binomial variation and occur in clusters. In such experiments the beta-binomial distribution can generally ...
متن کاملBeta - Binomial and Ordinal Joint Model with Random Effects for Analyzing Mixed Longitudinal Responses
The analysis of discrete mixed responses is an important statistical issue in various sciences. Ordinal and overdispersed binomial variables are discrete. Overdispersed binomial data are a sum of correlated Bernoulli experiments with equal success probabilities. In this paper, a joint model with random effects is proposed for analyzing mixed overdispersed binomial and ordinal longitudinal respo...
متن کاملCustomer Relationship Termination Problem for Beta-Geometric/Beta-Binomial Model of Customer Behavior
We deal with the relationship termination problem in the context of individual-level customer relationship management (CRM) and use a Markov decision process to determine the most appropriate occasion for termination of the relationship with a seemingly unprofitable customer. As a particular case, the beta-geometric/beta-binomial model is considered as the basis to define customer beha...
متن کاملBNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data
We perform differential expression analysis of high-throughput sequencing count data under a Bayesian nonparametric framework, removing sophisticated ad-hoc pre-processing steps commonly required in existing algorithms. We propose to use the gamma (beta) negative binomial process, which takes into account different sequencing depths using sample-specific negative binomial probability (dispersio...
متن کاملIndustrial Engineering and Computer Sciences Division (G2I) ON PERFORMANCE OF BINOMIAL AND BETA-BINOMIAL MODELS OF LEAD-TIME DEMAND FORECASTING FOR MULTIPLE SLOW-MOVING ITEMS WITH SHORT REQUESTS HISTORY
The paper deals with the lead-time demand forecasting for inventory management of multiple slow-moving items in the case when the available demand history is very short. Two stochastic models of demand are compared: (i) the first based on the “population-averaged” binomial distribution of requests (the traditional approach); and (ii) the second based on the beta-binomial probability distributio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 11 شماره
صفحات -
تاریخ انتشار 2016