نتایج جستجو برای: stochastic gradient descent learning
تعداد نتایج: 840759 فیلتر نتایج به سال:
In the age of artificial intelligence, best approach to handling huge amounts data is a tremendously motivating and hard problem. Among machine learning models, stochastic gradient descent (SGD) not only simple but also very effective. This study provides detailed analysis contemporary state-of-the-art deep applications, such as natural language processing (NLP), visual processing, voice audio ...
This paper investigates the stochastic optimization problem focusing on developing scalable parallel algorithms for deep learning tasks. Our solution involves a reformation of objective function in neural network models, along with novel computing strategy, coined weighted aggregating gradient descent ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/x...
Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. ...
In batch learning, stability together with existence and uniqueness of the solution corresponds to well-posedness of Empirical Risk Minimization (ERM) methods; recently, it was proved that CVloo stability is necessary and sufficient for generalization and consistency of ERM ([9]). In this note, we introduce CVon stability, which plays a similar role in online learning. We show that stochastic g...
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T )/T ), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This mig...
Stochastic gradient descent remains popular in large-scale machine learning, on account of its very low computational cost and robustness to noise. However, gradient descent is only linearly efficient and not transformation invariant. Scaling by a local measure can substantially improve its performance. One natural choice of such a scale is the Hessian of the objective function: Were it availab...
Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points — indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Ne...
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...
The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of secondorder information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a first-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the “Wild Track” of the first PAS...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید