Using MCMC chain outputs to efficiently estimate Bayes factors
نویسندگان
چکیده
One of the most important methodological problems in psychological research is assessing the reasonableness of null models, which typically constrain a parameter to a specific value such as zero. Bayes factor has been recently advocated in the statistical and psychological literature as a principled means of measuring the evidence in data for various models, including those where parameters are set to specific values. Yet, it is rarely adopted in substantive research, perhaps because of the difficulties in computation. Fortunately, for this problem, the Savage–Dickey density ratio (Dickey & Lientz, 1970) provides a conceptually simple approach to computing Bayes factor. Here, we review methods for computing the Savage–Dickey density ratio, and highlight an improved method, originally suggested by Gelfand and Smith (1990) and advocated by Chib (1995), that outperforms those currently discussed in the psychological literature. The improved method is based on conditional quantities, which may be integrated byMarkov chain Monte Carlo sampling to estimate Bayes factors. These conditional quantities efficiently utilize all the information in the MCMC chains, leading to accurate estimation of Bayes factors. We demonstrate the method by computing Bayes factors in one-sample and one-way designs, and show how it may be implemented in WinBUGS. © 2011 Elsevier Inc. All rights reserved. Frequently, researchers in psychological science must decide which of possibly several theoretical viewpoints is supported by data. For the past century, frequentist statistical methods, such as null hypothesis significance tests and inference by confidence intervals, have been popular in the psychological literature. Although there have been strong arguments for the use of Bayesian methods in psychology for over 50 years (eg, Edwards, Lindman, & Savage, 1963), Bayesian analysis has not been nearly as popular. In fact, there are seemingly more papers in psychology touting the benefits of Bayesian analysis than actually using these analyses to draw conclusions. One historical reason for this lack of popularity is that obtaining Bayesian quantities often requires significant computational resources. To quantify uncertainty about a statistical parameter, Bayesian methods marginalize over the uncertainty in all other parameters. Marginalizing over all other parameters requires integration over many dimensions, which is often impossible to do analytically. The rise of approximate methods such as Markov Chain Monte Carlo (MCMC; Gelfand & Smith, 1990; Geman & Geman, 1984), and the widespread availability of fast microcomputers has made integration considerably easier, ∗ Corresponding author. E-mail address: [email protected] (R.D. Morey). spurring the creation of general tools to perform MCMC analysis (eg, WinBUGS; Lunn, Thomas, Best, & Spiegelhalter, 2000). These tools allow model builders to easily obtain approximate samples frommarginal posterior distributions formany useful and relevant models in psychological science (Lee, 2011). Although the problem of parameter estimation has been largely solved by advances inMCMCmethods,model selection in Bayesian contexts remains computationally complicated. We advocate the use of Bayes factor (Jeffreys, 1961; Kass & Raftery, 1995), which we formally define in the next section. The Bayes factor is a Bayesian statistic that quantifies the relative amount of evidence provided by the data for two competing models, and is the ratio of two normalizing constants. MCMC methods used for parameter estimation often make use of the fact that it is possible to sample from distributions without knowledge of normalizing constants. For this reason, MCMCmethods designed for parameter estimation,which do not compute normalizing constants, are often not sufficient for model selection. A number of methods have been proposed to tackle the problem of computing Bayes factors (Meng & Wong, 1996; Raftery, Satagopan, Newton, & Krivitsky, 2007; Verdinelli & Wasserman, 1995), but many of these solutions are difficult to apply, require tailoring to specific problems, or can be unstable in some circumstances. A bright spot among these approaches is computation by the Savage–Dickey density ratio (Dickey, 0022-2496/$ – see front matter© 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jmp.2011.06.004 Author's personal copy R.D. Morey et al. / Journal of Mathematical Psychology 55 (2011) 368–378 369 1971; Dickey & Lientz, 1970). Wagenmakers and colleagues (Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010; Wetzels, Raaijmakers, Jakab, & Wagenmakers, 2009) have shown that in some situations the Savage–Dickey ratio does a reasonable job of estimating the Bayes factor using easily-obtained samples from MCMC chains. The fact that the method is implemented with MCMC sampling is highly attractive; one drawback is that the method as presented by Wagenmakers and colleagues is effectively limited to designs with a single effect parameter, such as a t test or regression with a single covariate. It cannot be easily extended, for example, to model selection in factorial designs (ANOVA) in which there are multiple effect parameters. In this article, we discuss an alternative method for computing the Savage–Dickey density ratio, conditional marginal density estimation (CMDE; Chen, 1994; Chib, 1995; Gelfand& Smith, 1990). With CMDE estimation of Savage–Dickey density ratios, Bayes factors may be accurately and efficiently estimated from MCMC chains for many designs, including those with multiple effect parameters, such as in ANOVA and multiple regression contexts. The outline of this paper is as follows: first, we discuss Bayesian model selection via Bayes factor, and show how Bayes factors may be computed using the Savage–Dickey density ratio. We then introduce methods of estimating the Savage–Dickey density ratio, including the CMDE method. The CMDE method is benchmarked against two alternative methods in a one-sample t test design, in which highly accurate estimates of Bayes factor are known for suitable default priors (Rouder, Speckman, Sun, Morey, & Iverson, 2009). Thereafter, we demonstrate how the CMDE method is applied straightforwardly to models with more than one effect parameter, and benchmark CMDE in a one-way ANOVA design.
منابع مشابه
Models for estimating bayes factors with applications to phylogeny and tests of monophyly.
Bayes factors comparing two or more competing hypotheses are often estimated by constructing a Markov chain Monte Carlo (MCMC) sampler to explore the joint space of the hypotheses. To obtain efficient Bayes factor estimates, Carlin and Chib (1995, Journal of the Royal Statistical Society, Series B57, 473-484) suggest adjusting the prior odds of the competing hypotheses so that the posterior odd...
متن کاملComputational tools for comparing asymmetric GARCH models via Bayes factors
In this paper we use Markov chain Monte Carlo (MCMC) methods in order to estimate and compare GARCH models from a Bayesian perspective. We allow for possibly heavy tailed and asymmetric distributions in the error term. We use a general method proposed in the literature to introduce skewness into a continuous unimodal and symmetric distribution. For each model we compute an approximation to the ...
متن کاملA computational framework for empirical Bayes inference
In empirical Bayes inference one is typically interested in sampling from the posterior distribution of a parameter with a hyper-parameter set to its maximum likelihood estimate. This is often problematic particularly when the likelihood function of the hyper-parameter is not available in closed form and the posterior distribution is intractable. Previous works have dealt with this problem usin...
متن کاملEstimating Bayes factors via thermodynamic integration and population MCMC
A Bayesian approach to model comparison based on the integrated or marginal likelihood is considered, and applications to linear regression models and nonlinear ordinary differential equation (ODE) models are used as the setting in which to elucidate and further develop existing statistical methodology. The focus is on two methods of marginal likelihood estimation. First, a statistical failure ...
متن کاملA General MCMC Method for Bayesian Inference in Logic-Based Probabilistic Modeling
We propose a general MCMC method for Bayesian inference in logic-based probabilistic modeling. It covers a broad class of generative models including Bayesian networks and PCFGs. The idea is to generalize an MCMC method for PCFGs to the one for a Turing-complete probabilistic modeling language PRISM in the context of statistical abduction where parse trees are replaced with explanations. We des...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011