Bayesian inference on quasi-sparse count data

نویسندگان

  • Jyotishka Datta
  • David B. Dunson
چکیده

There is growing interest in analysing high-dimensional count data, which often exhibit quasi-sparsity corresponding to an overabundance of zeros and small nonzero counts. Existing methods for analysing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to quasi-sparse settings. We develop a new class of continuous local-global shrinkage priors tailored to quasi-sparse counts. Theoretical properties are assessed, including flexible posterior concentration and stronger control of false discoveries in multiple testing. Simulation studies demonstrate excellent small-sample properties relative to competing methods. We use the method to detect rare mutational hotspots in exome sequencing data and to identify North American cities most impacted by terrorism.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian functional principal components analysis for binary and count data

Recently, van der Linde (2008) proposed a variational algorithm to obtain approximate Bayesian inference in functional principal components analysis (FPCA), where the functions were observed with Gaussian noise. Generalized FPCA under different noise models with sparse longitudinal data was developed by Hall, Müller and Yao (2008), but no Bayesian approach is available yet. It is demonstrated t...

متن کامل

Estimation of confidence intervals for Multinomial proportions of sparse contingency tables using Bayesian methods

Multinomial distribution, widely used in applications with discrete data, witnessed varieties of competing intervals from frequentist to Bayesian methods, still prove to be interesting in the case of zero counts or sparse contingency tables. The methods commonly recommended in both approaches are considered based on its influence of zero counts, polarizing cell counts, and aberrations. The infe...

متن کامل

Inference in generalized additive mixed models by using smoothing splines

Generalized additive mixed models are proposed for overdispersed and correlated data, which arise frequently in studies involving clustered, hierarchical and spatial designs. This class of models allows ̄exible functional dependence of an outcome variable on covariates by using nonparametric regression, while accounting for correlation between observations by using random effects. We estimate no...

متن کامل

Bayesian Inference for Spatial Beta Generalized Linear Mixed Models

In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...

متن کامل

Asynchronous Distributed Estimation of Topic Models for Document Analysis

Given the prevalence of large data sets and the availability of inexpensive parallel computing hardware, there is significant motivation to explore distributed implementations of statistical learning algorithms. In this paper, we present a distributed learning framework for Latent Dirichlet Allocation (LDA), a well-known Bayesian latent variable model for sparse matrices of count data. In the p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 103  شماره 

صفحات  -

تاریخ انتشار 2016