stochastic gradient descent

On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization

2017

Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov’s accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide significant improvements over stochastic gradient descent (SGD). Theoretically, these “fast gradient” methods have provable improvements over gradient descent...

متن کامل

On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization

2017

Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov’s accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide significant improvements over stochastic gradient descent (SGD). In general, “fast gradient” methods have provable improvements over gradient descent only for...

متن کامل

Blind Equalization Using the Constant ModulusCriterion : A

1997

C. Richard Johnson Philip Schniter Thomas J. Endres James D. Behm Donald R. Brown Rick Johnson

This paper provides a tutorial introduction to the constant modulus (CM) criterion for blind fractionally-spaced equalizer (FSE) design via a (stochastic) gradient descent algorithm such as the Constant Mod-ulus Algorithm. The topical divisions utilized in this tutorial can be used to help catalog the emerging literature on the CM criterion and on the behavior of (stochastic) gradient descent a...

متن کامل

Composite Objective Mirror Descent

2010

John C. Duchi Shai Shalev-Shwartz Yoram Singer Ambuj Tewari

We present a new method for regularized convex optimization and analyze it under both online and stochastic optimization settings. In addition to unifying previously known firstorder algorithms, such as the projected gradient method, mirror descent, and forwardbackward splitting, our method yields new analysis and algorithms. We also derive specific instantiations of our method for commonly use...

متن کامل

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Journal: :CoRR 2012

Alexander Rakhlin Ohad Shamir Karthik Sridharan

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T )/T ), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This mig...

متن کامل

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

2013

Ohad Shamir Tong Zhang

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required nontrivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate t...

متن کامل

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

2013

Rie Johnson Tong Zhang

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast conv...

متن کامل

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

2017

Prateek Jain Sham M. Kakade Rahul Kidambi Praneeth Netrapalli Venkata Krishna Pillutla Aaron Sidford

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and ad...

متن کامل

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

2017

Alberto Bietti Julien Mairal

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. However, in the context of empirical risk minimization, it is often helpful to augment the training set by considering random perturbations of input examples. In this case, the objective is no longer a finite sum, and the main candidate for optimization is the stochas...

متن کامل

Finite Sum Acceleration vs. Adaptive Learning Rates for the Training of Kernel Machines on a Budget

2016

Tobias Glasmachers

Training predictive models with stochastic gradient descent is widespread practice in machine learning. Recent advances improve on the basic technique in two ways: adaptive learning rates are widely used for deep learning, while acceleration techniques like stochastic average and variance reduced gradient descent can achieve a linear convergence rate. We investigate the utility of both types of...

متن کامل