stochastic gradient descent learning

نتایج جستجو برای: stochastic gradient descent learning

تعداد نتایج: 840759 فیلتر نتایج به سال:

Parle: parallelizing stochastic gradient descent

Journal: :CoRR 2017

Pratik Chaudhari Carlo Baldassi Riccardo Zecchina Stefano Soatto Ameet Talwalkar

We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4× faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been...

متن کامل

Stochastic Gradient Descent with GPGPU

2012

David Zastrau Stefan Edelkamp

We show how to optimize a Support Vector Machine and a predictor for Collaborative Filtering with Stochastic Gradient Descent on the GPU, achieving 1.66 to 6-times accelerations compared to a CPUbased implementation. The reference implementations are the Support Vector Machine by Bottou and the BRISMF predictor from the Netflix Prices winning team. Our main idea is to create a hash function of ...

متن کامل

Annealed Gradient Descent for Deep Learning

2015

Hengyue Pan Hui Jiang

Stochastic gradient descent (SGD) has been regarded as a successful optimization algorithm in machine learning. In this paper, we propose a novel annealed gradient descent (AGD) method for non-convex optimization in deep learning. AGD optimizes a sequence of gradually improved smoother mosaic functions that approximate the original non-convex objective function according to an annealing schedul...

متن کامل

Learning representations through stochastic gradient descent in cross-validation error

Journal: :CoRR 2016

Richard S. Sutton Vivek Veeriah

Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. ...

متن کامل

Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

2015

Rong Ge Furong Huang Chi Jin Yang Yuan

We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points. In this paper we identify strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an arbit...

متن کامل

Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)

2013

Francis R. Bach Eric Moulines

We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for...

متن کامل

Matching tutor to student: rules and mechanisms for efficient two-stage learning in neural circuits

2016

Tiberiu Tesileanu Bence Olveczky Vijay Balasubramanian

Existing models of birdsong learning assume that brain area LMAN introduces variability into song for trial-and-error learning. Recent data suggest that LMAN also encodes a corrective bias driving short-term improvements in song. These later consolidate in area RA, a motor cortex analogue downstream of LMAN. We develop a new model of such two-stage learning. Using a stochastic gradient descent ...

متن کامل

Communication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing

2012

Huasha Zhao John F. Canny

Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient...

متن کامل

Sparse Online Learning via Truncated Gradient: Appendix

2009

In the setting of standard online learning, we are interested in sequential prediction problems where for i = 1, 2, . . .: 1. An unlabeled example xi = [xi , . . . , x d i ] ∈ R arrives. 2. We make a prediction ŷi based on the current weights wi = [w i , . . . , w d i ] ∈ R. 3. We observe yi, let zi = (xi, yi), and incur some known loss L(wi, zi) convex in parameter wi. 4. We update weights acc...

متن کامل

Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

2013

Tianbao Yang

We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید