Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration–exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the 1930s, exploration–exploitation trade-offs aris...

متن کامل

Nonstochastic Multi-Armed Bandit Approach to Stochastic Discrete Optimization

We present a sampling-based algorithm for solving stochastic discrete optimization problems based on Auer et al.’s Exp3 algorithm for “nonstochastic multi-armed bandit problems.” The algorithm solves the sample average approximation (SAA) of the original problem by iteratively updating and sampling from a probability distribution over the search space. We show that as the number of samples goes...

متن کامل

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-pla...

متن کامل

Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems

This paper explores Thompson sampling in the context of mechanism design for stochastic multi-armed bandit (MAB) problems. The setting is that of an MAB problem where the reward distribution of each arm consists of a stochastic component as well as a strategic component. Many existing MAB mechanisms use upper confidence bound (UCB) based algorithms for learning the parameters of the reward dist...

متن کامل

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of stochastic multiarmed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existen...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Foundations and Trends® in Machine Learning

سال: 2012

ISSN: 1935-8237,1935-8245

DOI: 10.1561/2200000024