Deep Model Compression: Distilling Knowledge from Noisy Teachers

نویسندگان

Bharat Bhusan Sau

Vineeth N. Balasubramanian

چکیده

The remarkable successes of deep learning models across various applications have resulted in the design of deeper networks that can solve complex problems. However, the increasing depth of such models also results in a higher storage and runtime complexity, which restricts the deployability of such very deep models on mobile and portable devices, which have limited storage and battery capacity. While many methods have been proposed for deep model compression in recent years, almost all of them have focused on reducing storage complexity. In this work, we extend the teacher-student framework for deep model compression, since it has the potential to address runtime and train time complexity too. We propose a simple methodology to include a noise-based regularizer while training the student from the teacher, which provides a healthy improvement in the performance of the student network. Our experiments on the CIFAR-10, SVHN and MNIST datasets show promising improvement, with the best performance on the CIFAR-10 dataset. We also conduct a comprehensive empirical evaluation of the proposed method under related settings on the CIFAR-10 dataset to show the promise of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distilling Model Knowledge

Top-performing machine learning systems, such as deep neural networks, large ensembles and complex probabilistic graphical models, can be expensive to store, slow to evaluate and hard to integrate into larger systems. Ideally, we would like to replace such cumbersome models with simpler models that perform equally well. In this thesis, we study knowledge distillation, the idea of extracting the...

متن کامل

Deep Learning, Dark Knowledge, and Dark Matter

Particle colliders are the primary experimental instruments of high-energy physics. By creating conditions that have not occurred naturally since the Big Bang, collider experiments aim to probe the most fundamental properties of matter and the universe. These costly experiments generate very large amounts of noisy data, creating important challenges and opportunities for machine learning. In th...

متن کامل

A Study on Expert Primary School Teachers’ Deep Insight and a Model to Develop that in Student Teachers

The present research has endeavored to study deep insight experiences of expert teachers of primary schools as an effort to provide a model to be expanded for student teachers to use. Taking a qualitative approach, the research used narrative inquiry. The statistical population consisted of primary school teachers. Snowball sampling was used, and data collection and analysis were carried out by...

متن کامل

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction

This paper proposes an approach to distill knowledge from an ensemble of models to a single deep neural network (DNN) student model for punctuation prediction. This approach makes the DNN student model mimic the behavior of the ensemble. The ensemble consists of three single models. Kullback-Leibler (KL) divergence is used to minimize the difference between the output distribution of the DNN st...

متن کامل

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions [3]. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large n...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1610.09650 شماره

صفحات -

تاریخ انتشار 2016

Deep Model Compression: Distilling Knowledge from Noisy Teachers

نویسندگان

چکیده

منابع مشابه

Distilling Model Knowledge

Deep Learning, Dark Knowledge, and Dark Matter

A Study on Expert Primary School Teachers’ Deep Insight and a Model to Develop that in Student Teachers

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction

Distilling the Knowledge in a Neural Network

عنوان ژورنال:

اشتراک گذاری