gradient descent

Stochastic Proximal Gradient Descent with Acceleration Techniques

2014

Atsushi Nitanda

Proximal gradient descent (PGD) and stochastic proximal gradient descent (SPGD) are popular methods for solving regularized risk minimization problems in machine learning and statistics. In this paper, we propose and analyze an accelerated variant of these methods in the mini-batch setting. This method incorporates two acceleration techniques: one is Nesterov’s acceleration method, and the othe...

متن کامل

Conjugate gradient methods based on secant conditions that generate descent search directions for unconstrained optimization

Journal: :J. Computational Applied Mathematics 2012

Yasushi Narushima Hiroshi Yabe

Conjugate gradient methods have been paid attention to, because they can be directly applied to large-scale unconstrained optimization problems. In order to incorporate second order information of the objective function into conjugate gradient methods, Dai and Liao (2001) proposed a conjugate gradient method based on the secant condition. However, their method does not necessarily generate a de...

متن کامل

FLAG: Fast Linearly-Coupled Adaptive Gradient Method

Journal: :CoRR 2016

Xiang Cheng Farbod Roosta-Khorasani Peter L. Bartlett Michael W. Mahoney

The celebrated Nesterov’s accelerated gradient method offers great speed-ups compared to the classical gradient descend method as it attains the optimal first-order oracle complexity for smooth convex optimization. On the other hand, the popular AdaGrad algorithm competes with mirror descent under the best regularizer by adaptively scaling the gradient. Recently, it has been shown that the acce...

متن کامل

Less Regret via Online Conditioning

Journal: :CoRR 2010

Matthew J. Streeter H. Brendan McMahan

We analyze and evaluate an online gradient descent algorithm with adaptive per-coordinate adjustment of learning rates. Our algorithm can be thought of as an online version of batch gradient descent with a diagonal preconditioner. This approach leads to regret bounds that are stronger than those of standard online gradient descent for general online convex optimization problems. Experimentally,...

متن کامل

Beyond Convexity: Stochastic Quasi-Convex Optimization

2015

Elad Hazan Kfir Y. Levy Shai Shalev-Shwartz

Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. ...

متن کامل

Ant Colony Optimization and Stochastic Gradient Descent

Journal: :Artificial life 2002

Nicolas Meuleau Marco Dorigo

In this article, we study the relationship between the two techniques known as ant colony optimization (ACO) and stochastic gradient descent. More precisely, we show that some empirical ACO algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation of stochastic gradient descent that belongs to the family of ACO algorithms. We then use this i...

متن کامل

Multiple-Gradient Descent Algorithm (MGDA)

2009

Jean-Antoine Désidéri

In a previous report [3], a methodology for the numerical treatment of a two-objective optimization problem, possibly subject to equality constraints, was proposed. The method was devised to be adapted to cases where an initial design-point is known and such that one of the two disciplines, considered to be preponderant, or fragile, and said to be the primary discipline, achieves a local or glo...

متن کامل

Gradient Descent using Duality Structures

Journal: :CoRR 2017

Thomas Flynn

In most applications of gradient-based optimization to complex problems the choice of step size is based on trial-and-error and other heuristics. A case when it is easy to choose the step sizes is when the function has a Lipschitz continuous gradient. Many functions of interest do not appear at first sight to have this property, but often it can be established with the right choice of underlyin...

متن کامل

Learning ReLUs via Gradient Descent

2017

Mahdi Soltanolkotabi

In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form x ↦ max(0, ⟨w,x⟩) with w ∈ R denoting the weight vector. We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captu...

متن کامل

on descent spectral cg algorithm for training recurrent neural networks

2010

D. G. Sotiropoulos P. Pintelas I. E. Livieris

In this paper, we evaluate the performance of a new class of conjugate gradient methods for training recurrent neural networks which ensure the sufficient descent property. The presented methods preserve the advantages of classical conjugate gradient methods and simultaneously avoid the usually inefficient restarts. Simulation results are also presented using three different recurrent neural ne...

متن کامل