Photonic decision making for solving competitive multi-armed bandit problem using semiconductor laser networks
نویسندگان
چکیده
Multi-armed bandit problems concern decision making when selecting a slot machine among many machines with initially uncertain hit probabilities to maximize the total reward; this is fundamental problem of reinforcement learning. Furthermore, competitive multi-armed involve multiple agents in play, manifesting concerns regarding social figures, not just individual rewards. A representative issue selection conflict, which players select same and may miss reward as whole. This study proposes scheme for solving using semiconductor laser networks by introducing an exclusive mechanism. We numerically implement our method compare it conventional algorithms. show that outperforms algorithms problem.
منابع مشابه
Algorithms for the multi-armed bandit problem
The stochastic multi-armed bandit problem is an important model for studying the explorationexploitation tradeoff in reinforcement learning. Although many algorithms for the problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important...
متن کاملCombinatorial Multi-Objective Multi-Armed Bandit Problem
In this paper, we introduce the COmbinatorial Multi-Objective Multi-Armed Bandit (COMOMAB) problem that captures the challenges of combinatorial and multi-objective online learning simultaneously. In this setting, the goal of the learner is to choose an action at each time, whose reward vector is a linear combination of the reward vectors of the arms in the action, to learn the set of super Par...
متن کاملMULTI–ARMED BANDIT FOR PRICING Multi–Armed Bandit for Pricing
This paper is about the study of Multi–Armed Bandit (MAB) approaches for pricing applications, where a seller needs to identify the selling price for a particular kind of item that maximizes her/his profit without knowing the buyer demand. We propose modifications to the popular Upper Confidence Bound (UCB) bandit algorithm exploiting two peculiarities of pricing applications: 1) as the selling...
متن کاملBayesian and Approximate Bayesian Modeling of Human Sequential Decision-Making on the Multi-Armed Bandit Problem
In this paper we investigate human exploration/exploitation behavior in sequential-decision making tasks. Previous studies have suggested that people are suboptimal at scheduling exploration, and heuristic decision strategies are better predictors of human choices than the optimal model. By incorporating more realistic assumptions about subject’s knowledge and limitations into models of belief ...
متن کاملBayesian Modeling of Human Sequential Decision-Making on the Multi-Armed Bandit Problem
In this paper we investigate human exploration/exploitation behavior in sequential-decision making tasks. Previous studies have suggested that people are suboptimal at scheduling exploration, and heuristic decision strategies are better predictors of human choices than the optimal model. By incorporating more realistic assumptions about subject’s knowledge and limitations into models of belief ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nonlinear Theory and Its Applications, IEICE
سال: 2022
ISSN: ['2185-4106']
DOI: https://doi.org/10.1587/nolta.13.582