Computationally efficient permutation-based confidence interval estimation for tail-area FDR
نویسندگان
چکیده
Challenges of satisfying parametric assumptions in genomic settings with thousands or millions of tests have led investigators to combine powerful False Discovery Rate (FDR) approaches with computationally expensive but exact permutation testing. We describe a computationally efficient permutation-based approach that includes a tractable estimator of the proportion of true null hypotheses, the variance of the log of tail-area FDR, and a confidence interval (CI) estimator, which accounts for the number of permutations conducted and dependencies between tests. The CI estimator applies a binomial distribution and an overdispersion parameter to counts of positive tests. The approach is general with regards to the distribution of the test statistic, it performs favorably in comparison to other approaches, and reliable FDR estimates are demonstrated with as few as 10 permutations. An application of this approach to relate sleep patterns to gene expression patterns in mouse hypothalamus yielded a set of 11 transcripts associated with 24 h REM sleep [FDR = 0.15 (0.08, 0.26)]. Two of the corresponding genes, Sfrp1 and Sfrp4, are involved in wnt signaling and several others, Irf7, Ifit1, Iigp2, and Ifih1, have links to interferon signaling. These genes would have been overlooked had a typical a priori FDR threshold such as 0.05 or 0.1 been applied. The CI provides the flexibility for choosing a significance threshold based on tolerance for false discoveries and precision of the FDR estimate. That is, it frees the investigator to use a more data-driven approach to define significance, such as the minimum estimated FDR, an option that is especially useful for weak effects, often observed in studies of complex diseases.
منابع مشابه
Estimation and Control of the False Discovery Rate of Bayesian Network Skeleton Identification
An important problem in learning Bayesian networks is assessing confidence on the learnt structure. Prior work in constraint-based algorithms focuses on estimating or controlling the False Discovery Rate (FDR) when identifying the skeleton (set of edges without regard of direction) of a network. We present a unified approach to estimation and control of the FDR of Bayesian network skeleton iden...
متن کاملOn correcting the overestimation of the permutation-based false discovery rate estimator
MOTIVATION Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR), which is defined as the expected percentage of the number of false positive genes among the claimed significant genes. As a consequence, the accuracy of the FDR estimators will be important for correctly controlling FDR. Xie et al. found that ...
متن کاملfdrtool: a versatile R package for estimating local and tail area-based false discovery rates
UNLABELLED False discovery rate (FDR) methodologies are essential in the study of high-dimensional genomic and proteomic data. The R package 'fdrtool' facilitates such analyses by offering a comprehensive set of procedures for FDR estimation. Its distinctive features include: (i) many different types of test statistics are allowed as input data, such as P-values, z-scores, correlations and t-sc...
متن کاملInterval estimation for the difference between independent proportions: comparison of eleven methods.
Several existing unconditional methods for setting confidence intervals for the difference between binomial proportions are evaluated. Computationally simpler methods are prone to a variety of aberrations and poor coverage properties. The closely interrelated methods of Mee and Miettinen and Nurminen perform well but require a computer program. Two new approaches which also avoid aberrations ar...
متن کاملA note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data
MOTIVATION False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 4 شماره
صفحات -
تاریخ انتشار 2013