نتایج جستجو برای: gradient descent

تعداد نتایج: 137892  

Journal: :CoRR 2017
Jakub Konecný Peter Richtárik

In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an ε-ac...

1995
Michael Biehl

We study online gradient{descent learning in multilayer networks analytically and numerically. The training is based on randomly drawn inputs and their corresponding outputs as deened by a target rule. In the thermo-dynamic limit we derive deterministic diierential equations for the order parameters of the problem which allow an exact calculation of the evolution of the generalization error. Fi...

2016
Qi Meng Wei Chen Jingcheng Yu Taifeng Wang Zhiming Ma Tie-Yan Liu

Stochastic gradient descent (SGD) is a widely used optimization algorithm in machine learning. In order to accelerate the convergence of SGD, a few advanced techniques have been developed in recent years, including variance reduction, stochastic coordinate sampling, and Nesterov’s acceleration method. Furthermore, in order to improve the training speed and/or leverage larger-scale training data...

Journal: :Information processing in medical imaging : proceedings of the ... conference 2003
Zhong Tao C. Carl Jaffe Hemant D. Tagare

The presence of speckle in ultrasound images makes it hard to segment them using active contours. Speckle causes the energy function of the active contours to have many local minima, and the gradient descent procedure used for evolving the contour gets trapped in these minima. A new algorithm, called tunnelling descent, is proposed in this paper for evolving active contours. Tunnelling descent ...

Hamed Memarian fard,

The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...

Journal: :CoRR 2017
Pratik Chaudhari Carlo Baldassi Riccardo Zecchina Stefano Soatto Ameet Talwalkar

We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4× faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been...

1999
Llew Mason Jonathan Baxter Peter L. Bartlett Marcus R. Frean

Much recent attention, both experimental and theoretical, has been focussed on classication algorithms which produce voted combinations of classi ers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classi er having large margins on the training data. We present an abstract algorithm for nding linear combinati...

2007
Yiming Ying Massimiliano Pontil

This paper considers the least-square online gradient descent algorithm in a reproducing kernel Hilbert space (RKHS) without an explicit regularization term. We present a novel capacity independent approach to derive error bounds and convergence results for this algorithm. The essential element in our analysis is the interplay between the generalization error and a weighted cumulative error whi...

2012
David Zastrau Stefan Edelkamp

We show how to optimize a Support Vector Machine and a predictor for Collaborative Filtering with Stochastic Gradient Descent on the GPU, achieving 1.66 to 6-times accelerations compared to a CPUbased implementation. The reference implementations are the Support Vector Machine by Bottou and the BRISMF predictor from the Netflix Prices winning team. Our main idea is to create a hash function of ...

2004
Kary Främling

Adaptive behaviour through machine learning is challenging in many real-world applications such as robotics. This is because learning has to be rapid enough to be performed in real time and to avoid damage to the robot. Models using linear function approximation are interesting in such tasks because they offer rapid learning and have small memory and processing requirements. Adalines are a simp...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید