B-tests: Low Variance Kernel Two-Sample Tests

نویسندگان

  • Wojciech Zaremba
  • Arthur Gretton
  • Matthew Blaschko
چکیده

A family of maximum mean discrepancy (MMD) kernel two-sample tests is introduced. Members of the test family are called Block-tests or B-tests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the B-test family combines favorable properties of previously proposed MMD two-sample tests: B-tests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a U-statistic. A further important advantage of the B-tests is their asymptotically Normal null distribution: this is by contrast with the U-statistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the B-tests, yielding a means to optimize test power via kernel choice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

B-test: A Non-parametric, Low Variance Kernel Two-sample Test

We propose a family of maximum mean discrepancy (MMD) kernel two-sample tests that have low sample complexity and are consistent. The test has a hyperparameter that allows one to control the tradeoff between sample complexity and computational time. Our family of tests, which we denote as B-tests, is both computationally and statistically efficient, combining favorable properties of previously ...

متن کامل

Exponentially Consistent Kernel Two-Sample Tests

Given two sets of independent samples from unknown distributions P and Q, a two-sample test decides whether to reject the null hypothesis that P = Q. Recent attention has focused on kernel two-sample tests as the test statistics are easy to compute, converge fast, and have low bias with their finite sample estimates. However, there still lacks an exact characterization on the asymptotic perform...

متن کامل

Nonparametric Inferences for the Hazard Function with Right Truncation

Incompleteness is a major feature of time-to-event data. As one type of incompleteness, truncation refers to the unobservability of the time-to-event variable because it is smaller (or greater) than the truncation variable. A truncated sample always involves left and right truncation. Left truncation has been studied extensively while right truncation has not received the same level of attentio...

متن کامل

Fast Two-Sample Testing with Analytic Representations of Probability Measures

We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Ana...

متن کامل

Trend tests for case-control studies of genetic markers: power, sample size and robustness.

The Cochran-Armitage trend test is commonly used as a genotype-based test for candidate gene association. Corresponding to each underlying genetic model there is a particular set of scores assigned to the genotypes that maximizes its power. When the variance of the test statistic is known, the formulas for approximate power and associated sample size are readily obtained. In practice, however, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013