نتایج جستجو برای: stochastic gradient descent learning
تعداد نتایج: 840759 فیلتر نتایج به سال:
We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4× faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been...
We show how to optimize a Support Vector Machine and a predictor for Collaborative Filtering with Stochastic Gradient Descent on the GPU, achieving 1.66 to 6-times accelerations compared to a CPUbased implementation. The reference implementations are the Support Vector Machine by Bottou and the BRISMF predictor from the Netflix Prices winning team. Our main idea is to create a hash function of ...
Stochastic gradient descent (SGD) has been regarded as a successful optimization algorithm in machine learning. In this paper, we propose a novel annealed gradient descent (AGD) method for non-convex optimization in deep learning. AGD optimizes a sequence of gradually improved smoother mosaic functions that approximate the original non-convex objective function according to an annealing schedul...
Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. ...
We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points. In this paper we identify strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an arbit...
We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for...
Existing models of birdsong learning assume that brain area LMAN introduces variability into song for trial-and-error learning. Recent data suggest that LMAN also encodes a corrective bias driving short-term improvements in song. These later consolidate in area RA, a motor cortex analogue downstream of LMAN. We develop a new model of such two-stage learning. Using a stochastic gradient descent ...
Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient...
In the setting of standard online learning, we are interested in sequential prediction problems where for i = 1, 2, . . .: 1. An unlabeled example xi = [xi , . . . , x d i ] ∈ R arrives. 2. We make a prediction ŷi based on the current weights wi = [w i , . . . , w d i ] ∈ R. 3. We observe yi, let zi = (xi, yi), and incur some known loss L(wi, zi) convex in parameter wi. 4. We update weights acc...
We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید