stochastic gradient descent learning

Convex Optimization: Algorithms and Complexity

Journal: :Foundations and Trends in Machine Learning 2015

Sébastien Bubeck

This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms. Starting from the fundamental theory of black-box optimization, the material progresses towards recent advances in structural optimization and stochastic optimization. Our presentation of black-box optimization, strongly influenced by Nesterov’s seminal book and Nemirovski’s lecture n...

متن کامل

Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences

2012

Benjamin Recht

Randomized algorithms that base iteration-level decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling withand without-replacement in such algorithms. Focusing on least means squares...

متن کامل

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

2015

Behnam Neyshabur Ruslan Salakhutdinov Nathan Srebro

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-S...

متن کامل

Towards Stochastic Conjugate Gradient Methods

2002

Nicol N. Schraudolph Thore Graepel

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore a number of ways to adopt ideas from conjugate gradient in the stochastic setting, using fast Hessian-vector products to obtain curvature information cheaply. I...

متن کامل

Asynchronous Decentralized Parallel Stochastic Gradient Descent

Journal: :CoRR 2017

Xiangru Lian Wei Zhang Ce Zhang Ji Liu

Recent work shows that decentralized parallel stochastic gradient decent (D-PSGD) can outperform its centralized counterpart both theoretically and practically. While asynchronous parallelism is a powerful technology to improve the efficiency of parallelism in distributed machine learning platforms and has been widely used in many popular machine learning softwares and solvers based on centrali...

متن کامل

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-\L{}ojasiewicz Condition

2016

Hamed Karimi Julie Nutini Mark Schmidt

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older PolyakLojasiewicz (PL) inequality is actually weaker than the main condition...

متن کامل

When Does Stochastic Gradient Algorithm Work Well?

Journal: :CoRR 2018

Lam M. Nguyen Nam H. Nguyen Dzung T. Phan Jayant Kalagnanam Katya Scheinberg

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhoo...

متن کامل

Statistical Inference for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent

Journal: :CoRR 2018

Weijie Su Yuancheng Zhu

Stochastic gradient descent (SGD) is an immensely popular approach for online learningin settings where data arrives in a stream or data sizes are very large. However, despite anever-increasing volume of work on SGD, much less is known about the statistical inferentialproperties of SGD-based predictions. Taking a fully inferential viewpoint, this paper introducesa novel proc...

متن کامل

Use of Zernike Polynomials and SPGD Algorithm for Measuring the Reflected Wavefronts from the Lens Surfaces

Journal: International Journal of Optics and Photonics 2016

Hamid Reza Fallah, Morteza Hajimahmoodzadeh, Roghayeh Yazdani, Sebastian Petsch,

Recently, we have demonstrated a new and efficient method to simultaneously reconstruct two unknown interfering wavefronts. A three-dimensional interference pattern was analyzed and then Zernike polynomials and the stochastic parallel gradient descent algorithm were used to expand and calculate wavefronts. In this paper, as one of the applications of this method, the reflected wavefronts from t...

متن کامل

Mini-batch Block-coordinate based Stochastic Average Adjusted Gradient Methods to Solve Big Data Problems

2017

Vinod Kumar Chauhan Kalpana Dahiya Anuj Sharma

Big Data problems in Machine Learning have large number of data points or large number of features, or both, which make training of models difficult because of high computational complexities of single iteration of learning algorithms. To solve such learning problems, Stochastic Approximation offers an optimization approach to make complexity of each iteration independent of number of data poin...

متن کامل