Saddle-free Hessian-free Optimization
نویسنده
چکیده
Nonconvex optimization problems such as the ones in training deep neural networks suffer from a phenomenon called saddle point proliferation. This means that there are a vast number of high error saddle points present in the loss function. Second order methods have been tremendously successful and widely adopted in the convex optimization community, while their usefulness in deep learning remains limited. This is due to two problems: computational complexity and the methods being driven towards the high error saddle points. We introduce a novel algorithm specially designed to solve these two issues, providing a crucial first step to take the widely known advantages of Newton’s method to the nonconvex optimization community, especially in high dimensional settings.
منابع مشابه
A Free Line Search Steepest Descent Method for Solving Unconstrained Optimization Problems
In this paper, we solve unconstrained optimization problem using a free line search steepest descent method. First, we propose a double parameter scaled quasi Newton formula for calculating an approximation of the Hessian matrix. The approximation obtained from this formula is a positive definite matrix that is satisfied in the standard secant relation. We also show that the largest eigen value...
متن کاملAccelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
Nesterov's accelerated gradient descent (AGD), an instance of the general family of"momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open. This paper studies a simple variant of AGD, and shows that it escapes saddle points and finds a second-order stat...
متن کاملHessian-based Analysis of Large Batch Training and Robustness to Adversaries
Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The precise underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze how the landscape of the loss fu...
متن کاملA dimer-type saddle search algorithm with preconditioning and linesearch
The dimer method is a Hessian-free algorithm for computing saddle points. We augment the method with a linesearch mechanism for automatic step size selection as well as preconditioning capabilities. We prove local linear convergence. A series of numerical tests demonstrate significant performance gains.
متن کاملAn Efficient Dimer Method with Preconditioning and Linesearch
The dimer method is a Hessian-free algorithm for computing saddle points. We augment the method with a linesearch mechanism for automatic step size selection as well as preconditioning capabilities. We prove local linear convergence. A series of numerical tests demonstrate significant performance gains.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015