A linear response bandit problem
نویسندگان
چکیده
منابع مشابه
A Linear Programming Relaxation and a Heuristic for the Restless Bandit Problem with General Switching Costs
We extend a relaxation technique due to Bertsimas and Niño-Mora for the restless bandit problem to the case where arbitrary costs penalize switching between the bandits. We also construct a one-step lookahead policy using the solution of the relaxation. Computational experiments and a bound for approximate dynamic programming provide some empirical support for the heuristic.
متن کاملA Lemma on the Multiarmed Bandit Problem
We prove a lemma on the optimal value function for the mdtiarmed bandit problem which provides a simple direct proof of optimality of writeoff policies. This, in turn, leads to a new proof of optimality of the index rule.
متن کاملFast Generalized Stochastic Linear Bandit
We study a generalized stochastic linear bandit problem and propose an algorithm 1 that enjoys fast update. The computational complexity of the update is O(d), 2 where d is the dimension of a context space. In comparison with other stochastic 3 linear bandit algorithms, our algorithm does not need to incrementally update the 4 inverse of a matrix so that it can avoid the O(d) computations. Yet,...
متن کاملThe Nonstochastic Multiarmed Bandit Problem
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Stochastic Systems
سال: 2013
ISSN: 1946-5238
DOI: 10.1214/11-ssy032