No bad local minima: Data independent training error guarantees for multilayer neural networks

نویسندگان

  • Daniel Soudry
  • Yair Carmon
چکیده

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exponentially Vanishing Sub-optimal Local Minima in Multilayer Neural Networks

Background: Statistical mechanics results (Dauphin et al. (2014); Choromanska et al. (2015)) suggest that local minima with high error are exponentially rare in high dimensions. However, to prove low error guarantees for Multilayer Neural Networks (MNNs), previous works so far required either a heavily modified MNN model or training method, strong assumptions on the labels (e.g., “near” linear ...

متن کامل

Supervised Training Using Global Search Methods

Supervised learning in neural networks based on the popular backpropagation method can be often trapped in a local minimum of the error function. The class of backpropagation-type training algorithms includes local minimization methods that have no mechanism that allows them to escape the influence of a local minimum. The existence of local minima is due to the fact that the error function is t...

متن کامل

Cystoscopy Image Classication Using Deep Convolutional Neural Networks

In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...

متن کامل

Fast training of multilayer perceptrons

Training a multilayer perceptron by an error backpropagation algorithm is slow and uncertain. This paper describes a new approach which is much faster and certain than error backpropagation. The proposed approach is based on combined iterative and direct solution methods. In this approach, we use an inverse transformation for linearization of nonlinear output activation functions, direct soluti...

متن کامل

Local minima in training of neural networks

Training of a neural network is often formulated as a task of finding a “good” minimum of an error surface the graph of the loss expressed as a function of its weights. Due to the growing popularity of deep learning, the classical problem of studying the error surfaces of neural networks is now in the focus of many researchers. This stems from a long standing question. Given that deep networks ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1605.08361  شماره 

صفحات  -

تاریخ انتشار 2016