Nonparametric Assessment of Contamination in Multivariate Data Using Generalized Quantile Sets and FDR

نویسندگان

Clayton Scott

Eric Kolaczyk

چکیده

Large, multivariate datasets from high-throughput instrumentation have become ubiquitous in the sciences. Frequently, it is of interest to characterize the measurements in these datasets by the extent to which they represent ‘nominal’ versus ‘contaminated’ instances. However, often the nature of even the nominal patterns in the data are unknown and potentially quite complex, making their explicit parametric modeling a daunting task. In this paper, we introduce a nonparametric method for the simultaneous annotation of multivariate data (called MNSCAnn), by which one may produce an annotated ranking of the observations, indicating the relative extent to which each may or may not be considered nominal, while making minimal assumptions on the nature of the nominal distribution. In our framework each observation is linked to a corresponding generalized quantile set and, implicitly adopting a hypothesis testing perspective, each set is associated with a test, which in turn is accompanied by a certain false discovery rate. The combination of generalized quantile set methods with false discovery rate principles, in the context of contaminated data, is new, and estimation of the key underlying quantities requires that a number of issues be addressed. We illustrate MN-SCAnn through examples in two contexts: the pre-processing of cell-based assays in bioinformatics, and the detection of anomalous traffic patterns in Internet measurement studies. ∗Department of Electrical Engineering and Computer Science, University of Michigan, 1301 Beal Avenue, Ann Arbor, MI 48105. Email: cscott-at-eecs-dot-umich-dot-edu †Department of Mathematics and Statistics, Boston University, 111 Cummington Street, Boston, MA 02215. Email: kolaczyk-at-math-dot-bu-dot-edu

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model-based approaches to nonparametric Bayesian quantile regression

In several regression applications, a different structural relationship might be anticipated for the higher or lower responses than the average responses. In such cases, quantile regression analysis can uncover important features that would likely be overlooked by mean regression. We develop two distinct Bayesian approaches to fully nonparametric model-based quantile regression. The first appro...

متن کامل

Nonparametric Assessment of Contamination in Multivariate Data Using Minimum Volume Sets and FDR

Large, multivariate datasets from high-throughput instrumentation have become ubiquitous throughout the sciences. Frequently, it is of great interest to characterize the measurements in these datasets by the extent to which they represent ‘nominal’ versus ‘contaminated’ instances. However, often the nature of even the nominal patterns in the data are unknown and potentially quite complex, makin...

متن کامل

Bayesian Nonparametric Modeling in Quantile Regression

We propose Bayesian nonparametric methodology for quantile regression modeling. In particular, we develop Dirichlet process mixture models for the error distribution in an additive quantile regression formulation. The proposed nonparametric prior probability models allow the data to drive the shape of the error density and thus provide more reliable predictive inference than models based on par...

متن کامل

Nonparametric multivariate conditional distribution and quantile regression

In nonparametric multivariate regression analysis, one usually seeks methods to reduce the dimensionality of the regression function to bypass the difficulty caused by the curse of dimensionality. We study nonparametric estimation of multivariate conditional distribution and quantile regression via local univariate quadratic estimation of partial derivatives of bivariate copulas. Without restri...

متن کامل

A Frisch-newton Algorithm for Sparse Quantile Regression

Recent experience has shown that interior-point methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems. In many large empirical applications, the design matrix has a very sparse structure. A typical example is the classical fixed-effect model for panel data where the parametric dimension of the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Nonparametric Assessment of Contamination in Multivariate Data Using Generalized Quantile Sets and FDR

نویسندگان

چکیده

منابع مشابه

Model-based approaches to nonparametric Bayesian quantile regression

Nonparametric Assessment of Contamination in Multivariate Data Using Minimum Volume Sets and FDR

Bayesian Nonparametric Modeling in Quantile Regression

Nonparametric multivariate conditional distribution and quantile regression

A Frisch-newton Algorithm for Sparse Quantile Regression

عنوان ژورنال:

اشتراک گذاری