Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams
نویسندگان
چکیده
When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively in a similar way capturing model partners. In this work, we propose analyze decentralized Multi-Armed Bandit (MAB) problem coupled rewards as an abstraction more general multi-agent collaboration. demonstrate naive extensions single-agent optimal MAB algorithms fail when applied bandit teams. Instead, Partner-Aware strategy joint sequential decision-making extends well-known Upper Confidence Bound algorithm. analytically show proposed achieves logarithmic regret, provide extensive experiments involving human-AI human-robot collaboration validate theoretical findings. Our results partner-aware outperforms other known methods, human subject studies suggest prefer implementing strategy.
منابع مشابه
Safety-Aware Algorithms for Adversarial Contextual Bandit
Appendix A. Proof of Proposition 2.1 Proof. The proof is mainly about adapting the specific two-player game presented in (Mannor et al., 2009) to the general online convex programming setting with adversarial constraints. We closely follow the notations in the example from Proposition 4 in (Mannor et al., 2009). Let us define the decision set X = ([1, 2]), namely a 2-D simplex. We design two di...
متن کاملSafety-Aware Algorithms for Adversarial Contextual Bandit
In this work we study the safe sequential decision making problem under the setting of adversarial contextual bandits with sequential risk constraints. At each round, nature prepares a context, a cost for each arm, and additionally a risk for each arm. The learner leverages the context to pull an arm and receives the corresponding cost and risk associated with the pulled arm. In addition to min...
متن کاملHybrid Decentralized Control System for Communication Aware Mobile Robotic Teams
Enabling a team of robots to self-organize into a multi-hop ad hoc network as it simultaneously completes a task requires a system architecture that controls both the motion of each robot and their communication variables. In this paper, we consider this objective and propose a hybrid architecture composed of both centralized and decentralized components. This novel architecture utilizes the st...
متن کاملStochastic cooperative advertising in a manufacturer–retailer decentralized supply channel
This work considers cooperative advertising in a manufacturer–retailer supply chain. While the manufacturer is the Stackelberg leader, the retailer is the follower. Using Sethi model it models the dynamic effect of the manufacturer and retailer’s advertising efforts on sale. It uses optimal control technique and stochastic differential game theory to obtain the players’ advertising strategies a...
متن کاملSpatially Aware Decentralized Computing
The Q-Machine is a spatially aware, decentralized massively parallel computer system that achieves latency reduction (as opposed to latency hiding) and enhanced fault tolerance through a virtual machine interface. Without obscuring the high-level architecture, details of the machine are hidden by the virtual machine interface, so that objects and threads can be efficiently migrated to reduce la...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i9.21158