نتایج جستجو برای: sgd

تعداد نتایج: 1169  

2005
Kim L. Blackmore Robert C. Williamson Iven M. Y. Mareels

Stepwise Gradient Descent (SGD) algorithms for online optimization converge to local minima of the relevant cost function. In this paper a globally convergent modification of SGD is proposed, in which several solutions of SGD are run in parallel, together with online estimates of the cost function and its gradient. As each SGD estimate reaches a local minimum of the cost, the fitness of the mem...

2016
Panos Toulis Dustin Tran Edoardo M. Airoldi

Iterative procedures for parameter estimation based on stochastic gradient descent (sgd) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on sgd are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed a...

Journal: :CoRR 2017
Dong Yin Ashwin Pananjady Maximilian Lam Dimitris S. Papailiopoulos Kannan Ramchandran Peter Bartlett

It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation. We introduce the no...

Journal: :CoRR 2017
Zhenguo Li Fengwei Zhou Fei Chen Hang Li

Few-shot learning is challenging for learning algorithms that learn each task in isolation and from scratch. In contrast, meta-learning learns from many related tasks a meta-learner that can learn a new task more accurately and faster with fewer examples, where the choice of meta-learners is crucial. In this paper, we develop Meta-SGD, an SGD-like, easily trainable meta-learner that can initial...

Journal: :Discrete Mathematics & Theoretical Computer Science 2008
Shu-Chiuan Chang Lung-Chi Chen

We present the numbers of spanning forests on the Sierpinski gasket SGd(n) at stage n with dimension d equal to two, three and four, and determine the asymptotic behaviors. The corresponding results on the generalized Sierpinski gasket SGd,b(n) with d = 2 and b = 3, 4 are obtained. We also derive the upper bounds of the asymptotic growth constants for both SGd and SG2,b.

2016
Behnam Neyshabur Yuhuai Wu Ruslan Salakhutdinov Nathan Srebro

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD...

Journal: :CoRR 2017
Jian Zhang Ioannis Mitliagkas Christopher Ré

Adaptive Optimization Hyperparameter tuning is a big cost of deep learning. Momentum: a key hyperparameter to SGD and variants. Adaptive methods, e.g. Adam1, don’t tune momentum. YellowFin optimizer • Based on the robustness properties of momentum. • Auto-tuning of momentum and learning rate in SGD. • Closed-loop momentum control for async. training. Experiments ResNet and LSTM YellowFin runs w...

Journal: :ACM Transactions on Knowledge Discovery From Data 2022

This paper 1 studies how to schedule hyperparameters improve generalization of both centralized single-machine stochastic gradient descent (SGD) and distributed asynchronous SGD (ASGD). augmented with momentum variants (e.g., heavy ball (SHB) Nesterov’s accelerated (NAG)) has been the default optimizer for many tasks, in environments. However, advanced variants, despite empirical advantage over...

Journal: :CoRR 2018
Yujing Ma Florin Rusu Martin Torres

There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their computeintensive primitives in two flavors—as multi-thread routines for multi-core CPUs and as highly-parallel kernels executed on GPU. Stochastic gradient descent (SGD) is the most popula...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید