stochastic gradient descent learning

نتایج جستجو برای: stochastic gradient descent learning

تعداد نتایج: 840759 فیلتر نتایج به سال:

Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

Journal: :Neural computation 2003

Justin Werfel Xiaohui Xie H. Sebastian Seung

Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are sometimes used to overcome these difficulties. We analyze three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. Learning speed is defined as the rate of exponential decay in the learning curves....

متن کامل

ADADELTA: An Adaptive Learning Rate Method

Journal: :CoRR 2012

Matthew D. Zeiler

We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, var...

متن کامل

Asynchronous Distributed Semi-Stochastic Gradient Optimization

2016

Ruiliang Zhang Shuai Zheng James T. Kwok

With the recent proliferation of large-scale learning problems, there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However, existing algorithms either suffer from slow convergence due to the inherent variance of stochastic gradients, or have a fast linear convergence rate but at t...

متن کامل

On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization

2017

Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov’s accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide significant improvements over stochastic gradient descent (SGD). Theoretically, these “fast gradient” methods have provable improvements over gradient descent...

متن کامل

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

Journal: :CoRR 2018

Yi Zhou Yingbin Liang Huishuai Zhang

The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied nonconvex loss functions, but only considered the generalization error of the SGD in expectation. In this paper, we establish various generalization error bounds...

متن کامل

On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization

2017

Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov’s accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide significant improvements over stochastic gradient descent (SGD). In general, “fast gradient” methods have provable improvements over gradient descent only for...

متن کامل

I Nefficiency of Stochastic Gradient Descent with Larger Mini - Batches ( and More Learners )

2016

Onkar Bhardwaj Guojing Cong

Stochastic Gradient Descent (SGD) and its variants are the most important optimization algorithms used in large scale machine learning. Mini-batch version of stochastic gradient is often used in practice for taking advantage of hardware parallelism. In this work, we analyze the effect of mini-batch size over SGD convergence for the case of general non-convex objective functions. Building on the...

متن کامل

Large Scale Learning to Rank

2009

D. Sculley

Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing an objective defined over O(n) possible pairs for data sets with n examples. In this paper, we remove this super-linear dependence on training set size by sampling pairs from an implicit pairwise expansion and applying efficient stochastic gradient descent learners for...

متن کامل

Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms

Journal: :Advances in neural information processing systems 2015

Christopher De Sa Ce Zhang Kunle Olukotun Christopher Ré

Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques. Specif...

متن کامل

Splash: User-friendly Programming Interface for Parallelizing Stochastic Algorithms

Journal: :CoRR 2015

Yuchen Zhang Michael I. Jordan

Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems. Splash consists of a programming interface and an execution engine. Using the programming interface, the user develops sequential stochastic algorithms without ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید