نتایج جستجو برای: reward penalty scheme

تعداد نتایج: 265788  

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2022

In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by stringent budget constraint on available resources, which are consumed in random amount each action, stochastic feasibility that may impose important operational limitations decision-making. this work, we consider general model to address such problems, w...

Journal: :Transactions of The Japanese Society for Artificial Intelligence 2021

In this paper, we propose a novel method of reward design for multi-agent reinforcement learning (MARL). One the main uses MARL is building cooperative policies between self-interested agents. We take inspiration from concept mechanism game theory to modify how agents are rewarded in algorithms. defined payment that reflects negative contribution other agents’ valuation same manner as Vickrey-C...

Journal: :Complex & Intelligent Systems 2021

Abstract The last years have seen a rapid growth of the takeaway delivery market, which has provided lot jobs for deliverymen. However, increasing numbers orders and corresponding pickup service points made order selection path planning key challenging problem to In this paper, we present integrating deliverymen, objective is maximize revenue per unit time subject maximum length, overdue penalt...

2006
R. Clark A. El-Osery K. Wedeward

This paper describes two complementary algorithms developed for mobile robots operating within unknown, maze-type environments. The first is an environmental mapping and navigation algorithm which ensures complete coverage of a maze with apriori unknown wall locations, and the second a stochastic learning automaton approach for general obstacle avoidance within the maze. The environmental mappi...

2004
B. J. Oommen S. Sitharam Iyengar Nicte Andrade

We consider the problem of a robot manipulator operating in a noisy workspace. The robot is assigned the task of moving from Pi to Pf. Since Pi is its initial position, this position can be known fairly accurately. However, since Pf is usually obtained as a result of a sensing operation, possibly vision sensing, we assume that Pf i s noisy. We propose a solution to achieve the motion which invo...

2010
Scott Alfeld Matthew E. Taylor Prateek Tandon Milind Tambe

Common wisdom says that the greater the level of teamwork, the higher the performance of the team. In teams of cooperative autonomous agents, working together rather than independently can increase the team reward. However, recent results show that in uncertain environments, increasing the level of teamwork can actually decrease overall performance. Coined the team uncertainty penalty, this phe...

Journal: :IGTR 2005
Marco Slikker

A network is a graph where the nodes represent players and the links represent bilateral interaction between the players. A reward game assigns a value to every network on a fixed set of players. An allocation scheme specifies how to distribute the worth of every network among the players. This allocation scheme is link monotonic if extending the network does not decrease the payoff of any play...

2014
Fan Ye Enlu Zhou

Information relaxation and duality in Markov decision processes have been studied recently to derive upper bounds on the maximal expected reward (or lower bounds on the minimal expected cost). The idea is to relax the non-anticipativity constraint on the controls and impose a penalty to punish such a violation. In this paper we generalize this dual approach to controlled Markov diffusions. We d...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید