Estimating p-values in small microarray experiments
نویسندگان
چکیده
MOTIVATION Microarray data typically have small numbers of observations per gene, which can result in low power for statistical tests. Test statistics that borrow information from data across all of the genes can improve power, but these statistics have non-standard distributions, and their significance must be assessed using permutation analysis. When sample sizes are small, the number of distinct permutations can be severely limited, and pooling the permutation-derived test statistics across all genes has been proposed. However, the null distribution of the test statistics under permutation is not the same for equally and differentially expressed genes. This can have a negative impact on both p-value estimation and the power of information borrowing statistics. RESULTS We investigate permutation based methods for estimating p-values. One of methods that uses pooling from a selected subset of the data are shown to have the correct type I error rate and to provide accurate estimates of the false discovery rate (FDR). We provide guidelines to select an appropriate subset. We also demonstrate that information borrowing statistics have substantially increased power compared to the t-test in small experiments.
منابع مشابه
Comparison of Small Area Estimation Methods for Estimating Unemployment Rate
Extended Abstract. In recent years, needs for small area estimations have been greatly increased for large surveys particularly household surveys in Sta­ tistical Centre of Iran (SCI), because of the costs and respondent burden. The lack of suitable auxiliary variables between two decennial housing and popula­ tion census is a challenge for SCI in using these methods. In general, the...
متن کامل190-31: Estimate the False Discovery Rate Using SAS®
This paper gives an exposition of recently developed methods of estimating the false discovery rate (FDR) under multiple comparisons and discusses their implementation using SAS. For example, microarray experiments typically involve tests of significance for hundreds or thousands of genes. For biologists confronted by this problem of multiplicity, the FDR is an appealing quantification of error...
متن کاملA Comparison of Two Classes of Methods for Estimating False Discovery Rates in Microarray Studies
The goal of many microarray studies is to identify genes that are differentially expressed between two classes or populations. Many data analysts choose to estimate the false discovery rate (FDR) associated with the list of genes declared differentially expressed. Estimating an FDR largely reduces to estimating π 1, the proportion of differentially expressed genes among all analyzed genes. Esti...
متن کاملMissing Value Estimation In DNA Microarray – A Fuzzy Approach
DNA microarray technology which is used in molecular biology, allows for the observation of expression levels of thousands of genes under a variety of conditions. The analysis of microarray data has been successfully applied in a number of studies over a broad range of biological disciplines. Now it is very unfortunate that various microarray experiments generate data sets containing missing va...
متن کاملIterative bicluster-based least square framework for estimation of missing values in microarray gene expression data
DNA microarray experiment inevitably generates gene expression data with missing values. An important and necessary pre-processing step is thus to impute these missing values. Existing imputation methods exploit gene correlation among all experimental conditions for estimating the missing values. However, related genes coexpress in subsets of experimental conditions only. In this paper, we prop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 23 1 شماره
صفحات -
تاریخ انتشار 2007