sgd

نتایج جستجو برای: sgd

تعداد نتایج: 1169 فیلتر نتایج به سال:

Hasty Congregational Gradient Descent

2005

Kim L. Blackmore Robert C. Williamson Iven M. Y. Mareels

Stepwise Gradient Descent (SGD) algorithms for online optimization converge to local minima of the relevant cost function. In this paper a globally convergent modification of SGD is proposed, in which several solutions of SGD are run in parallel, together with online estimates of the cost function and its gradient. As each SGD estimate reaches a local minimum of the cost, the fitness of the mem...

متن کامل

Towards Stability and Optimality in Stochastic Gradient Descent

2016

Panos Toulis Dustin Tran Edoardo M. Airoldi

Iterative procedures for parameter estimation based on stochastic gradient descent (sgd) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on sgd are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed a...

متن کامل

Gradient Diversity Empowers Distributed Learning

Journal: :CoRR 2017

Dong Yin Ashwin Pananjady Maximilian Lam Dimitris S. Papailiopoulos Kannan Ramchandran Peter Bartlett

It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation. We introduce the no...

متن کامل

Meta-SGD: Learning to Learn Quickly for Few Shot Learning

Journal: :CoRR 2017

Zhenguo Li Fengwei Zhou Fei Chen Hang Li

Few-shot learning is challenging for learning algorithms that learn each task in isolation and from scratch. In contrast, meta-learning learns from many related tasks a meta-learner that can learn a new task more accurately and faster with fewer examples, where the choice of meta-learners is crucial. In this paper, we develop Meta-SGD, an SGD-like, easily trainable meta-learner that can initial...

متن کامل

Spanning Forests on the Sierpinski Gasket

Journal: :Discrete Mathematics & Theoretical Computer Science 2008

Shu-Chiuan Chang Lung-Chi Chen

We present the numbers of spanning forests on the Sierpinski gasket SGd(n) at stage n with dimension d equal to two, three and four, and determine the asymptotic behaviors. The corresponding results on the generalized Sierpinski gasket SGd,b(n) with d = 2 and b = 3, 4 are obtained. We also derive the upper bounds of the asymptotic growth constants for both SGd and SG2,b.

متن کامل

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

2016

Behnam Neyshabur Yuhuai Wu Ruslan Salakhutdinov Nathan Srebro

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD...

متن کامل

YellowFin and the Art of Momentum Tuning

Journal: :CoRR 2017

Jian Zhang Ioannis Mitliagkas Christopher Ré

Adaptive Optimization Hyperparameter tuning is a big cost of deep learning. Momentum: a key hyperparameter to SGD and variants. Adaptive methods, e.g. Adam1, don’t tune momentum. YellowFin optimizer • Based on the robustness properties of momentum. • Auto-tuning of momentum and learning rate in SGD. • Closed-loop momentum control for async. training. Experiments ResNet and LSTM YellowFin runs w...

متن کامل

Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

Journal: :IEEE Journal on Selected Areas in Information Theory 2020

متن کامل

Scheduling Hyperparameters to Improve Generalization: From Centralized SGD to Asynchronous SGD

Journal: :ACM Transactions on Knowledge Discovery From Data 2022

This paper 1 studies how to schedule hyperparameters improve generalization of both centralized single-machine stochastic gradient descent (SGD) and distributed asynchronous SGD (ASGD). augmented with momentum variants (e.g., heavy ball (SHB) Nesterov’s accelerated (NAG)) has been the default optimizer for many tasks, in environments. However, advanced variants, despite empirical advantage over...

متن کامل

Stochastic Gradient Descent on Highly-Parallel Architectures

Journal: :CoRR 2018

Yujing Ma Florin Rusu Martin Torres

There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their computeintensive primitives in two flavors—as multi-thread routines for multi-core CPUs and as highly-parallel kernels executed on GPU. Stochastic gradient descent (SGD) is the most popula...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید