نتایج جستجو برای: q policy
تعداد نتایج: 381585 فیلتر نتایج به سال:
We consider a two-stage serial inventory system whose cost structure exhibits economies of scale in both stages. In the system, stage 1 faces Poisson demand and replenishes its inventory from stage 2, and the latter stage in turn orders from an outside supplier with unlimited stock. Each shipment, either to stage 2 or to stage 1, incurs a fixed setup cost. We derive important properties for a g...
Reinforcement Learning (RL) methods enable autonomous robots to learn skills from scratch by interacting with the environment. However, reinforcement learning can be very time consuming. This paper focuses on accelerating the reinforcement learning process on a mobile robot in an unknown environment. The presented algorithm is based on approximate policy iteration with a continuous state space ...
Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-a...
Hierarchical Reinforcement Learning (HRL) exploits temporal abstraction to solve large Markov Decision Processes (MDP) and provide transferable subtask policies. In this paper, we introduce an off-policy HRL algorithm: Hierarchical Q-value Iteration (HQI). We show that it is possible to effectively learn recursive optimal policies for any valid hierarchical decomposition of the original MDP, gi...
Reinforcement learning has its origin from the animal learning theory. RL does not require prior knowledge but can autonomously get optional policy with the help of knowledge obtained by trial-and-error and continuously interacting with the dynamic environment. Due to its characteristics of self improving and online learning, reinforcement learning has become one of intelligent agent’s core tec...
In Levy (2004) I attempt to analyze whether political parties are effective. That is, whether they change the political outcome relative to the case in which they do not exist and candidates can only run independently. The main result shows that parties are not effective when the policy space is one dimensional but may become effective when the policy space has more than one dimension. To deriv...
Traditionally a Reinforcement Learning (RL) policy is stored in a lookup table. From such a table it is difficult to observe the behavioral logic or manually adjust this logic post-learning is difficult. This paper shows how behavioral logic of a RL controller is presented in an insightful manner and can be adjusted using the Behavior Tree (BT) framework. It shows a method to approximate an RL ...
In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a nite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for o -line value ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید