CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits
نویسندگان
چکیده
The multi-armed bandit (MAB) problem is an important model for studying the exploration–exploitation tradeoff in sequential decision making. In this problem, a gambler has to repeatedly choose between a number of slot machine arms to maximize the total payout, where the total number of plays is fixed. Although many methods have been proposed to solve the MAB problem, most have been designed for problems with a small number of arms. To ensure convergence to the optimal arm, many of these methods, including state-of-the-art methods such as UCB [2], require sweeping over the entire set of arms. As a result, such methods perform poorly in problems with a large number of arms. This paper proposes a new method for solving such large-scale MAB problems. The method, called Cross-Entropy-based Multi Armed Bandit (CEMAB), uses the Cross-Entropy method as a noisy optimizer to find the optimal arm with as little cost as possible. Experimental results indicate that CEMAB outperforms state-of-the-art methods for solving MABs with a large number of arms.
منابع مشابه
Scalable Discrete Sampling as a Multi-Armed Bandit Problem
Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling also suffers from high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models...
متن کاملAlmost Optimal Exploration in Multi-Armed Bandits
We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameterfree algor...
متن کاملAn Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space an...
متن کاملSequential Monte Carlo Bandits
In this paper we propose a flexible and efficient framework for handling multi-armed bandits, combining sequential Monte Carlo algorithms with hierarchical Bayesian modeling techniques. The framework naturally encompasses restless bandits, contextual bandits, and other bandit variants under a single inferential model. Despite the model’s generality, we propose efficient Monte Carlo algorithms t...
متن کاملA Survey on Contextual Multi-armed Bandits
4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017