EE-Grad: Exploration and Exploitation for Cost-Efficient Mini-Batch SGD
نویسندگان
چکیده
We present a generic framework for trading off fidelity and cost in computing stochastic gradients when the costs of acquiring stochastic gradients of different quality are not known a priori. We consider a mini-batch oracle that distributes a limited query budget over a number of stochastic gradients and aggregates them to estimate the true gradient. Since the optimal mini-batch size depends on the unknown cost-fidelity function, we propose an algorithm, EE-Grad, that sequentially explores the performance of mini-batch oracles and exploits the accumulated knowledge to estimate the one achieving the best performance in terms of cost-efficiency. We provide performance guarantees for EE-Grad with respect to the optimal mini-batch oracle, and illustrate these results in the case of strongly convex objectives. We also provide a simple numerical example that corroborates our theoretical findings.
منابع مشابه
Hybrid Accelerated Optimization for Speech Recognition
Optimization procedure is crucial to achieve desirable performance for speech recognition based on deep neural networks (DNNs). Conventionally, DNNs are trained by using mini-batch stochastic gradient descent (SGD) which is stable but prone to be trapped into local optimum. A recent work based on Nesterov’s accelerated gradient descent (NAG) algorithm is developed by merging the current momentu...
متن کاملBalanced Mini-batch Sampling for SGD Using Determinantal Point Processes
We study a mini-batch diversification scheme for stochastic gradient descent (SGD). While classical SGD relies on uniformly sampling data points to form a mini-batch, we propose a non-uniform sampling scheme based on the Determinantal Point Process (DPP). The DPP relies on a similarity measure between data points and gives low probabilities to mini-batches which contain redundant data, and high...
متن کاملStochastic Learning on Imbalanced Data: Determinantal Point Processes for Mini-batch Diversification
We study a mini-batch diversification scheme for stochastic gradient descent (SGD). While classical SGD relies on uniformly sampling data points to form a mini-batch, we propose a non-uniform sampling scheme based on the Determinantal Point Process (DPP). The DPP relies on a similarity measure between data points and gives low probabilities to mini-batches which contain redundant data, and high...
متن کاملThe Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Stochastic Gradient Descent (SGD) with small mini-batch is a key component in modern large-scale machine learning. However, its efficiency has not been easy to analyze as most theoretical results require adaptive rates and show convergence rates far slower than that for gradient descent, making computational comparisons difficult. In this paper we aim to clarify the issue of fast SGD convergenc...
متن کاملGradient Diversity Empowers Distributed Learning
It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation. We introduce the no...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.07070 شماره
صفحات -
تاریخ انتشار 2017