Scalable Approximate Bayesian Inference for Outlier Detection under Informative Sampling

نویسنده

  • Terrance D. Savitsky
چکیده

Government surveys of business establishments receive a large volume of submissions where a small subset contain errors. Analysts need a fast-computing algorithm to flag this subset due to a short time window between collection and reporting. We offer a computationallyscalable optimization method based on non-parametric mixtures of hierarchical Dirichlet processes that allows discovery of multiple industry-indexed local partitions linked to a set of global cluster centers. Outliers are nominated as those clusters containing few observations. We extend an existing approach with a new “merge” step that reduces sensitivity to hyperparameter settings. Survey data are typically acquired under an informative sampling design where the probability of inclusion depends on the surveyed response such that the distribution for the observed sample is different from the population. We extend the derivation of a penalized objective function to use a pseudo-posterior that incorporates sampling weights that “undo” the informative design. We provide a simulation study to demonstrate that our approach produces unbiased estimation for the outlying cluster under informative sampling. The method is applied for outlier nomination for the Current Employment Statistics survey conducted by the Bureau of Labor Statistics. c ©2016 Terrance D. Savitsky.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bayesian Approach for Detecting Outliers in ARMA Time Series

The presence of outliers in time series can seriously affect the model specification and parameter estimation. To avoid these adverse effects, it is essential to detect these outliers and remove them from time series. By the Bayesian statistical theory, this article proposes a method for simultaneously detecting the additive outlier (AO) and innovative outlier (IO) in an autoregressive moving-a...

متن کامل

An Approximate Bayesian Long Short-Term Memory Algorithm for Outlier Detection

Long Short-Term Memory networks trained with gradient descent and back-propagation have received great success in various applications. However, point estimation of the weights of the networks is prone to over-fitting problems and lacks important uncertainty information associated with the estimation. However, exact Bayesian neural network methods are intractable and non-applicable for real-wor...

متن کامل

Posterior Predictive Outlier Detection Using Sample Reweighting

In a Bayesian model, we de ne an outlier as an observation which is \surprising" relative to its predictive distribution, under the model, given the remainder of the data. Hence \outlyingness" can be measured by the posterior predictive p-value of any interesting scalar summary of the (possibly multivariate) observation. For this calculation, we exclude the case of interest from the data, analo...

متن کامل

Efficient variational Bayesian neural network ensembles for outlier detection

In this work we perform outlier detection using ensembles of neural networks obtained by variational approximation of the posterior in a Bayesian neural network setting. The variational parameters are obtained by sampling from the true posterior by gradient descent. We show our outlier detection results are comparable to those obtained using other efficient ensembling methods.

متن کامل

Fficient Variational B Ayesian Neural Net - Work Ensembles for Outlier Detection

In this work we perform outlier detection using ensembles of neural networks obtained by variational approximation of the posterior in a Bayesian neural network setting. The variational parameters are obtained by sampling from the true posterior by gradient descent. We show our outlier detection results are comparable to those obtained using other efficient ensembling methods.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016