stochastic gradient descent learning

نتایج جستجو برای: stochastic gradient descent learning

تعداد نتایج: 840759 فیلتر نتایج به سال:

Conditional Accelerated Lazy Stochastic Gradient Descent

2017

Guanghui Lan Sebastian Pokutta Yi Zhou Daniel Zink

In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate O( 1 ε2 ) improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate O( 1 ε4 ).

متن کامل

AG-SGD: Angle-Based Stochastic Gradient Descent

Journal: :IEEE Access 2021

متن کامل

Semi-Stochastic Gradient Descent Methods

Journal: :CoRR 2017

Jakub Konecný Peter Richtárik

In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an ε-ac...

متن کامل

Asynchronous Accelerated Stochastic Gradient Descent

2016

Qi Meng Wei Chen Jingcheng Yu Taifeng Wang Zhiming Ma Tie-Yan Liu

Stochastic gradient descent (SGD) is a widely used optimization algorithm in machine learning. In order to accelerate the convergence of SGD, a few advanced techniques have been developed in recent years, including variance reduction, stochastic coordinate sampling, and Nesterov’s acceleration method. Furthermore, in order to improve the training speed and/or leverage larger-scale training data...

متن کامل

Online Gradient Descent Learning Algorithms

Journal: :Foundations of Computational Mathematics 2007

متن کامل

Shampoo: Preconditioned Stochastic Tensor Optimization

Journal: :CoRR 2018

Vineet Gupta Tomer Koren Yoram Singer

Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates ...

متن کامل

Chapter 2 LEARNING RATE ADAPTATION IN STOCHASTIC GRADIENT DESCENT

2001

V. P. Plagianakos

The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...

متن کامل

Online Learning with Adaptive Local Step Sizes

1999

Nicol N. Schraudolph

Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical indep...

متن کامل

WNGrad: Learn the Learning Rate in Gradient Descent

2018

Xiaoxia Wu Rachel Ward L'eon Bottou

Adjusting the learning rate schedule in stochastic gradient methods is an important unresolved problem which requires tuning in practice. If certain parameters of the loss function such as smoothness or strong convexity constants are known, theoretical learning rate schedules can be applied. However, in practice, such parameters are not known, and the loss function of interest is not convex in ...

متن کامل

Adaptativity of Stochastic Gradient Descent

2015

Aymeric Dieuleveut Francis Bach

We consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS H, even if the optimal predictor (i.e., the conditional expectation) is not in H. In a stochastic approximation framework where the estimator ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید