نتایج جستجو برای: stochastic gradient descent

تعداد نتایج: 258150  

Journal: :CoRR 2016
Alexandre Salle Aline Villavicencio Marco Idiart

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent cooccurrences while still accounting for negative co-occurrence. Evaluation on word simila...

2017
Yiming Wang Vijayaditya Peddinti Hainan Xu Xiaohui Zhang Daniel Povey Sanjeev Khudanpur

In this paper we describe a modification to Stochastic Gradient Descent (SGD) that improves generalization to unseen data. It consists of doing two steps for each minibatch: a backward step with a small negative learning rate, followed by a forward step with a larger learning rate. The idea was initially inspired by ideas from adversarial training, but we show that it can be viewed as a crude w...

Journal: :CoRR 2011
Wei Xu

For large scale learning problems, it is desirable if we can obtain the optimal model parameters by going through the data in only one pass. Polyak and Juditsky (1992) showed that asymptotically the test performance of the simple average of the parameters obtained by stochastic gradient descent (SGD) is as good as that of the parameters which minimize the empirical cost. However, to our knowled...

2014
Mohammad Taha Bahadori Yi Chang Bo Long Yan Liu

In this paper, we propose to study the problem of heterogeneous transfer ranking, a transfer learning problem with heterogeneous features in order to utilize the rich large-scale labeled data in popular languages to help the ranking task in less popular languages. We develop a large-margin algorithm, namely LM-HTR, to solve the problem by mapping the input features in both the source domain and...

Journal: :CoRR 2013
Shenghuo Zhu

With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T ) for strongly convex functions, instead of O(κ ln(T )/T ). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T ).

Journal: :CoRR 2017
Tianyang Li Liu Liu Anastasios Kyrillidis Constantine Caramanis

We present a novel method for frequentist statistical inference in M -estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the OrnsteinUhlenbeck process suggests that such averages are asymptotically normal. From a prac...

2008
Thomas Gärtner

Training Non-linear Structured Prediction Models with Stochastic Gradient Descent Thomas Gärtner [email protected] Shankar Vembu [email protected] Fraunhofer IAIS, Schloß Birlinghoven, 53754 Sankt Augustin, Germany

Journal: :Math. Program. 2012
Guanghui Lan Arkadi Nemirovski Alexander Shapiro

The main goal of this paper is to develop accuracy estimates for stochastic programming problems by employing stochastic approximation (SA) type algorithms. To this end we show that while running a Mirror Descent Stochastic Approximation procedure one can compute, with a small additional effort, lower and upper statistical bounds for the optimal objective value. We demonstrate that for a certai...

2011
Aditya Krishna Menon Charles Elkan

We propose to solve the link prediction problem in graphs using a supervised matrix factorization approach. The model learns latent features from the topological structure of a (possibly directed) graph, and is shown to make better predictions than popular unsupervised scores. We show how these latent features may be combined with optional explicit features for nodes or edges, which yields bett...

2005
Anatoli Juditsky Alexander V. Nazin Alexandre B. Tsybakov Nicolas Vayatis

We consider the problem of constructing an aggregated estimator from a finite class of base functions which approximately minimizes a convex risk functional under the l1 constraint. For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent in the dual space. The generated estimates are additionally averaged in a recursive fashion with specific weig...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید