Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

نویسندگان

چکیده

When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively in a similar way capturing model partners. In this work, we propose analyze decentralized Multi-Armed Bandit (MAB) problem coupled rewards as an abstraction more general multi-agent collaboration. demonstrate naive extensions single-agent optimal MAB algorithms fail when applied bandit teams. Instead, Partner-Aware strategy joint sequential decision-making extends well-known Upper Confidence Bound algorithm. analytically show proposed achieves logarithmic regret, provide extensive experiments involving human-AI human-robot collaboration validate theoretical findings. Our results partner-aware outperforms other known methods, human subject studies suggest prefer implementing strategy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Safety-Aware Algorithms for Adversarial Contextual Bandit

Appendix A. Proof of Proposition 2.1 Proof. The proof is mainly about adapting the specific two-player game presented in (Mannor et al., 2009) to the general online convex programming setting with adversarial constraints. We closely follow the notations in the example from Proposition 4 in (Mannor et al., 2009). Let us define the decision set X = ([1, 2]), namely a 2-D simplex. We design two di...

متن کامل

Safety-Aware Algorithms for Adversarial Contextual Bandit

In this work we study the safe sequential decision making problem under the setting of adversarial contextual bandits with sequential risk constraints. At each round, nature prepares a context, a cost for each arm, and additionally a risk for each arm. The learner leverages the context to pull an arm and receives the corresponding cost and risk associated with the pulled arm. In addition to min...

متن کامل

Hybrid Decentralized Control System for Communication Aware Mobile Robotic Teams

Enabling a team of robots to self-organize into a multi-hop ad hoc network as it simultaneously completes a task requires a system architecture that controls both the motion of each robot and their communication variables. In this paper, we consider this objective and propose a hybrid architecture composed of both centralized and decentralized components. This novel architecture utilizes the st...

متن کامل

Stochastic cooperative advertising in a manufacturer–retailer decentralized supply channel

This work considers cooperative advertising in a manufacturer–retailer supply chain. While the manufacturer is the Stackelberg leader, the retailer is the follower. Using Sethi model it models the dynamic effect of the manufacturer and retailer’s advertising efforts on sale. It uses optimal control technique and stochastic differential game theory to obtain the players’ advertising strategies a...

متن کامل

Spatially Aware Decentralized Computing

The Q-Machine is a spatially aware, decentralized massively parallel computer system that achieves latency reduction (as opposed to latency hiding) and enhanced fault tolerance through a virtual machine interface. Without obscuring the high-level architecture, details of the machine are hidden by the virtual machine interface, so that objects and threads can be efficiently migrated to reduce la...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i9.21158