Two-armed bandits with a goal, I. One arm known
نویسندگان
چکیده
منابع مشابه
Best Arm Identification in Multi-Armed Bandits
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean reward of the ultimately chosen arm. We propose a highly exploring UCB policy and a new algorithm based on successive rejects. We show that these algorithms are essentially optimal since their regre...
متن کاملBest arm identification in multi-armed bandits with delayed feedback
We propose a generalization of the best arm identification problem in stochastic multiarmed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework ...
متن کاملOn ergodic two-armed bandits
A device has two arms with unknown deterministic payoffs, and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e. with greatest Cesaro limit) with probability o...
متن کاملTop Arm Identification in Multi-Armed Bandits with Batch Arm Pulls
We introduce a new multi-armed bandit (MAB) problem in which arms must be sampled in batches, rather than one at a time. This is motivated by applications in social media monitoring and biological experimentation where such batch constraints naturally arise. This paper develops and analyzes algorithms for batch MABs and top arm identification, for both fixed confidence and fixed budget settings...
متن کاملProbability ON ERGODIC TWO - ARMED BANDITS
A device has two arms with unknown deterministic payoffs, and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e. with greatest Cesaro limit) with probability o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Advances in Applied Probability
سال: 1980
ISSN: 0001-8678,1475-6064
DOI: 10.1017/s0001867800035497