Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
نویسندگان
چکیده
منابع مشابه
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration–exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the 1930s, exploration–exploitation trade-offs aris...
متن کاملNonstochastic Multi-Armed Bandit Approach to Stochastic Discrete Optimization
We present a sampling-based algorithm for solving stochastic discrete optimization problems based on Auer et al.’s Exp3 algorithm for “nonstochastic multi-armed bandit problems.” The algorithm solves the sample average approximation (SAA) of the original problem by iteratively updating and sampling from a probability distribution over the search space. We show that as the number of samples goes...
متن کاملOptimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-pla...
متن کاملThompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems
This paper explores Thompson sampling in the context of mechanism design for stochastic multi-armed bandit (MAB) problems. The setting is that of an MAB problem where the reward distribution of each arm consists of a stochastic component as well as a strategic component. Many existing MAB mechanisms use upper confidence bound (UCB) based algorithms for learning the parameters of the reward dist...
متن کاملRegret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem
This paper is devoted to regret lower bounds in the classical model of stochastic multiarmed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Foundations and Trends® in Machine Learning
سال: 2012
ISSN: 1935-8237,1935-8245
DOI: 10.1561/2200000024