نتایج جستجو برای: reward penalty scheme
تعداد نتایج: 265788 فیلتر نتایج به سال:
In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by stringent budget constraint on available resources, which are consumed in random amount each action, stochastic feasibility that may impose important operational limitations decision-making. this work, we consider general model to address such problems, w...
In this paper, we propose a novel method of reward design for multi-agent reinforcement learning (MARL). One the main uses MARL is building cooperative policies between self-interested agents. We take inspiration from concept mechanism game theory to modify how agents are rewarded in algorithms. defined payment that reflects negative contribution other agents’ valuation same manner as Vickrey-C...
Abstract The last years have seen a rapid growth of the takeaway delivery market, which has provided lot jobs for deliverymen. However, increasing numbers orders and corresponding pickup service points made order selection path planning key challenging problem to In this paper, we present integrating deliverymen, objective is maximize revenue per unit time subject maximum length, overdue penalt...
This paper describes two complementary algorithms developed for mobile robots operating within unknown, maze-type environments. The first is an environmental mapping and navigation algorithm which ensures complete coverage of a maze with apriori unknown wall locations, and the second a stochastic learning automaton approach for general obstacle avoidance within the maze. The environmental mappi...
We consider the problem of a robot manipulator operating in a noisy workspace. The robot is assigned the task of moving from Pi to Pf. Since Pi is its initial position, this position can be known fairly accurately. However, since Pf is usually obtained as a result of a sensing operation, possibly vision sensing, we assume that Pf i s noisy. We propose a solution to achieve the motion which invo...
Common wisdom says that the greater the level of teamwork, the higher the performance of the team. In teams of cooperative autonomous agents, working together rather than independently can increase the team reward. However, recent results show that in uncertain environments, increasing the level of teamwork can actually decrease overall performance. Coined the team uncertainty penalty, this phe...
A network is a graph where the nodes represent players and the links represent bilateral interaction between the players. A reward game assigns a value to every network on a fixed set of players. An allocation scheme specifies how to distribute the worth of every network among the players. This allocation scheme is link monotonic if extending the network does not decrease the payoff of any play...
Information relaxation and duality in Markov decision processes have been studied recently to derive upper bounds on the maximal expected reward (or lower bounds on the minimal expected cost). The idea is to relax the non-anticipativity constraint on the controls and impose a penalty to punish such a violation. In this paper we generalize this dual approach to controlled Markov diffusions. We d...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید