gradient descent

An overview of gradient descent optimization algorithms

Journal: :CoRR 2016

Sebastian Ruder

Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different vari...

متن کامل

On the Gradient Descent in Backpropagation and Its Substitution by a Genetic Algorithm

1999

UDO SEIFFERT

Backpropagation is the standard training procedure for Multiple Layer Perceptron networks. It is based on the gradient descent to minimize the network error. However, using the gradient descent algorithm leads to some problems with the convergence of the training at all and to restrictions concerning applicable transfer functions as well. This paper describes a complete substitution of the grad...

متن کامل

An Efficient Optimization Method for Extreme Learning Machine Using Artificial Bee Colony

2017

Chao Ma

Traditional learning algorithms with gradient descent based technique, such as back-propagation (BP) and its variant Levenberg-Marquardt (LM) have been widely used in the training of multilayer feedforward neural networks. The gradient descent based algorithm may converge usually slower than required time in training, since many iterative learning step are needed by such learning algorithm, and...

متن کامل

Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

2013

Tianbao Yang

We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...

متن کامل

Analysis Techniques for Adaptive Online Learning

Journal: :CoRR 2014

H. Brendan McMahan

We present tools for the analysis of Follow-The-Regularized-Leader (FTRL), Dual Averaging, and Mirror Descent algorithms when the regularizer (equivalently, proxfunction or learning rate schedule) is chosen adaptively based on the data. Adaptivity can be used to prove regret bounds that hold on every round, and also allows for data-dependent regret bounds as in AdaGrad-style algorithms (e.g., O...

متن کامل

e-Distance Weighted Support Vector Regression

Journal: :CoRR 2016

Yan Wang Ge Ou

We propose a novel support vector regression approach called e-Distance Weighted Support Vector Regression (e-DWSVR). e-DWSVR specifically addresses two challenging issues in support vector regression: first, the process of noisy data; second, how to deal with the situation when the distribution of boundary data is different from that of the overall data. The proposed e-DWSVR optimizes the mini...

متن کامل

On-the-Fly Learning in a Perpetual Learning Machine

Journal: :CoRR 2015

Andrew J. R. Simpson

Despite the promise of brain-inspired machine learning, deep neural networks (DNN) have frustratingly failed to bridge the deceptively large gap between learning and memory. Here, we introduce a Perpetual Learning Machine; a new type of DNN that is capable of brain-like dynamic ‘on the fly’ learning because it exists in a self-supervised state of Perpetual Stochastic Gradient Descent. Thus, we ...

متن کامل

Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient Descent

Journal: :CoRR 2015

Andrew J. R. Simpson

When training deep neural networks, it is typically assumed that the training examples are uniformly difficult to learn. Or, to restate, it is assumed that the training error will be uniformly distributed across the training examples. Based on these assumptions, each training example is used an equal number of times. However, this assumption may not be valid in many cases. “Oddball SGD” (novelt...

متن کامل

Active Learning for Support Vector Machines with Maximum Model Change

2014

Wenbin Cai Ya Zhang Siyuan Zhou Wenquan Wang Chris H. Q. Ding Xiao Gu

Margin-based strategies and model change based strategies represent two important types of strategies for active learning. While margin-based strategies have been dominant for Support Vector Machines (SVMs), most methods are based on heuristics and lack a solid theoretical support. In this paper, we propose an active learning strategy for SVMs based on Maximum Model Change (MMC). The model chan...

متن کامل

On the Möbius Function of Permutations with One Descent

Journal: :Electr. J. Comb. 2014

Jason P. Smith

The set of all permutations, ordered by pattern containment, is a poset. We give a formula for the Möbius function of intervals [1, π] in this poset, for any permutation π with at most one descent. We compute the Möbius function as a function of the number and positions of pairs of consecutive letters in π that are consecutive in value. As a result of this we show that the Möbius function is un...

متن کامل