Optimal Regularized Dual Averaging Methods for Stochastic Optimization
نویسندگان
چکیده
This paper considers a wide spectrum of regularized stochastic optimization problems where both the loss function and regularizer can be non-smooth. We develop a novel algorithm based on the regularized dual averaging (RDA) method, that can simultaneously achieve the optimal convergence rates for both convex and strongly convex loss. In particular, for strongly convex loss, it achieves the optimal rate ofO( 1 N + 1 N2 ) forN iterations, which improves the rateO( logN N ) for previous regularized dual averaging algorithms. In addition, our method constructs the final solution directly from the proximal mapping instead of averaging of all previous iterates. For widely used sparsity-inducing regularizers (e.g., l1-norm), it has the advantage of encouraging sparser solutions. We further develop a multistage extension using the proposed algorithm as a subroutine, which achieves the uniformly-optimal rate O( 1 N + exp{−N}) for strongly convex loss.
منابع مشابه
Randomized Block Subgradient Methods for Convex Nonsmooth and Stochastic Optimization
Block coordinate descent methods and stochastic subgradient methods have been extensively studied in optimization and machine learning. By combining randomized block sampling with stochastic subgradient methods based on dual averaging ([22, 36]), we present stochastic block dual averaging (SBDA)—a novel class of block subgradient methods for convex nonsmooth and stochastic optimization. SBDA re...
متن کاملDual Averaging Methods for Regularized Stochastic Learning and Online Optimization
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as !1-norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting...
متن کاملStochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems
We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning. We propose two new stochastic gradient methods that are based on stochastic dual averaging method with variance reduction. Our methods generate a sparser solution than the existing methods because we do not need to take the average of the history o...
متن کاملIterative parameter mixing for distributed large-margin training of structured predictors for natural language processing
The development of distributed training strategies for statistical prediction functions is important for applications of machine learning, generally, and the development of distributed structured prediction training strategies is important for natural language processing (NLP), in particular. With ever-growing data sets this is, first, because, it is easier to increase computational capacity by...
متن کاملReweighted l1 Dual Averaging Approach for Sparse Stochastic Learning
Recent advances in stochastic optimization and regularized dual averaging approaches revealed a substantial interest for a simple and scalable stochastic method which is tailored to some more specific needs. Among the latest one can find sparse signal recovery and l0-based sparsity inducing approaches. These methods in particular can force many components of the solution shrink to zero thus cla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012