Sequential Multi-Hypothesis Testing in Multi-Armed Bandit Problems: An Approach for Asymptotic Optimality

نویسندگان

چکیده

We consider a multi-hypothesis testing problem involving $K$ -armed bandit. Each arm’s signal follows distribution from vector exponential family. The actual parameters of the arms are unknown to decision maker. maker incurs delay cost for until and switching whenever he switches one arm another. His goal is minimise overall reached on true hypothesis. Of interest policies that satisfy given constraint probability false detection. This sequential making where gets only limited view state nature at each stage, but can control his by choosing observe stage. An information-theoretic lower bound total (expected time reliable plus cost) first identified, variation policy based generalised likelihood ratio statistic then studied. Due family assumption, processing stage simple; associated conjugate prior model enables easy updates posterior distribution. proposed policy, with suitable threshold stopping, shown Under continuous selection also be asymptotically optimal in terms among all

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MULTI–ARMED BANDIT FOR PRICING Multi–Armed Bandit for Pricing

This paper is about the study of Multi–Armed Bandit (MAB) approaches for pricing applications, where a seller needs to identify the selling price for a particular kind of item that maximizes her/his profit without knowing the buyer demand. We propose modifications to the popular Upper Confidence Bound (UCB) bandit algorithm exploiting two peculiarities of pricing applications: 1) as the selling...

متن کامل

Algorithms for multi-armed bandit problems

The stochastic multi-armed bandit problem is an important model for studying the explorationexploitation tradeoff in reinforcement learning. Although many algorithms for the problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important...

متن کامل

Pure Exploration for Multi-Armed Bandit Problems

We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast...

متن کامل

Multi-armed Bandit Problems with Strategic Arms

We study a strategic version of the multi-armed bandit problem, where each arm is an individual strategic agent and we, the principal, pull one arm each round. When pulled, the arm receives some private reward va and can choose an amount xa to pass on to the principal (keeping va−xa for itself). All non-pulled arms get reward 0. Each strategic arm tries to maximize its own utility over the cour...

متن کامل

Multi-armed Bandit Problems with History

In a multi-armed bandit problem, at each time step, an algorithm chooses one of the possible arms and observes its rewards. The goal is to maximize the sum of rewards over all time steps (or to minimize the regret). In the conventional formulation of the problem, the algorithm has no prior knowledge about the arms. Many applications, however, provide some data about the arms even before the alg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Information Theory

سال: 2022

ISSN: ['0018-9448', '1557-9654']

DOI: https://doi.org/10.1109/tit.2022.3159600