Dropout distillation

نویسندگان

Samuel Rota Bulò

Lorenzo Porzi

Peter Kontschieder

چکیده

Dropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i.e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called “standard dropout” is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined “dropout distillation”, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Bias: Training a More Accurate Neural Network by Emphasizing High Variance Samples

Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of minibatch SGD, and the proximity of the correct class probability to the deci...

متن کامل

Analysis of Deep Neural Networks with Extended Data Jacobian Matrix

Deep neural networks have achieved great success on a variety of machine learning tasks. There are many fundamental and open questions yet to be answered, however. We introduce the Extended Data Jacobian Matrix (EDJM) as an architecture-independent tool to analyze neural networks at the manifold of interest. The spectrum of the EDJM is found to be highly correlated with the complexity of the le...

متن کامل

A Non-Random Dropout Model for Analyzing Longitudinal Skew-Normal Response

In this paper, multivariate skew-normal distribution is em- ployed for analyzing an outcome based dropout model for repeated mea- surements with non-random dropout in skew regression data sets. A probit regression is considered as the conditional probability of an ob- servation to be missing given outcomes. A simulation study of using the proposed methodology and comparing it with a semi-parame...

متن کامل

A Comparative Review of Selection Models in Longitudinal Continuous Response Data with Dropout

Missing values occur in studies of various disciplines such as social sciences, medicine, and economics. The missing mechanism in these studies should be investigated more carefully. In this article, some models, proposed in the literature on longitudinal data with dropout are reviewed and compared. In an applied example it is shown that the selection model of Hausman and Wise (1979, Econometri...

متن کامل

Building Robust Deep Neural Networks for Road Sign Detection

Deep Neural Networks are built to generalize outside of training set in mind by using techniques such as regularization, early stopping and dropout. But considerations to make them more resilient to adversarial examples are rarely taken. As deep neural networks become more prevalent in mission critical and real time systems, miscreants start to attack them by intentionally making deep neural ne...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Dropout distillation

نویسندگان

چکیده

منابع مشابه

Active Bias: Training a More Accurate Neural Network by Emphasizing High Variance Samples

Analysis of Deep Neural Networks with Extended Data Jacobian Matrix

A Non-Random Dropout Model for Analyzing Longitudinal Skew-Normal Response

A Comparative Review of Selection Models in Longitudinal Continuous Response Data with Dropout

Building Robust Deep Neural Networks for Road Sign Detection

عنوان ژورنال:

اشتراک گذاری