q policy

نتایج جستجو برای: q policy

تعداد نتایج: 381585 فیلتر نتایج به سال:

An Efficient Heuristic Optimization Algorithm for a Two - Echelon ( R , Q ) Inventory System

2008

Mohammad H. Al-Rifai Manuel D. Rossetti

This paper presents a two-echelon non-repairable spare parts inventory system that consists of one warehouse and m identical retailers and implements the reorder point, order quantity (R, Q) inventory policy. We formulate the policy decision problem in order to minimize the total annual inventory investment subject to average annual ordering frequency and expected number of backorder constraint...

متن کامل

Rational and Convergent Model-Free Adaptive Learning for Team Markov Games1

2007

Francisco S. Melo M. Isabel Ribeiro Francisco A. Melo

In this paper, we address multi-agent decision problems where all agents share a common goal. This class of problems is suitably modeled using finite-state Markov games with identical interests. We tackle the problem of coordination and contribute a new algorithm, coordinated Qlearning (CQL). CQL combines Q-learning with biased adaptive play, a coordination mechanism based on the principle of f...

متن کامل

Q-Decomposition for Reinforcement Learning Agents

2003

Stuart J. Russell Andrew Zimdars

The paper explores a very simple agent design method called Q-decomposition, wherein a complex agent is built from simpler subagents. Each subagent has its own reward function and runs its own reinforcement learning process. It supplies to a central arbitrator the Q-values (according to its own reward function) for each possible action. The arbitrator selects an action maximizing the sum of Q-v...

متن کامل

Bayesian Deep Q-Learning via Continuous-Time Flows

2018

Ruiyi Zhang Changyou Chen Chunyuan Li Lawrence Carin

Efficient exploration in reinforcement learning (RL) can be achieved by incorporating uncertainty into model predictions. Bayesian deep Q-learning provides a principle way for this by modeling Q-values as probability distributions. We propose an efficient algorithm for Bayesian deep Q-learning by posterior sampling actions in the Q-function via continuous-time flows (CTFs), achieving efficient ...

متن کامل

Learning to Explore with Meta-Policy Gradient

2018

Tianbing Xu Qiang Liu Liang Zhao Jian Peng

The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy. Existing exploration methods are mostly based on adding noise to the on-going actor policy and can only explore local regions close to what the actor policy dictates. In this work, we develop a simple meta-policy gradient al...

متن کامل

A Planning Algorithm for Predictive State Representations

2003

Masoumeh T. Izadi Doina Precup

We address the problem of optimally controlling stochastic environments that are partially observable. The standard method for tackling such problems is to define and solve a Partially Observable Markov Decision Process (POMDP). However, it is well known that exactly solving POMDPs is very costly computationally. Recently, Littman, Sutton and Singh (2002) have proposed an alternative representa...

متن کامل

Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

Journal: :IEEE Transactions on Neural Networks and Learning Systems 2019

متن کامل

An Actor/Critic Algorithm that is Equivalent to Q-Learning

1994

Robert H. Crites Andrew G. Barto

We prove the convergence of an actor/critic algorithm that is equivalent to Q-learning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using c...

متن کامل

Heuristic Dynamic Programming Nonlinear Optimal Controller

2012

Asma Al-tamimi Murad Abu-Khalaf Frank Lewis

This chapter is concerned with the application of approximate dynamic programming techniques (ADP) to solve for the value function, and hence the optimal control policy, in discrete-time nonlinear optimal control problems having continuous state and action spaces. ADP is a reinforcement learning approach (Sutton & Barto, 1998) based on adaptive critics (Barto et al., 1983), (Widrow et al., 1973...

متن کامل

Tree-Based Batch Mode Reinforcement Learning

Journal: :Journal of Machine Learning Research 2005

Damien Ernst Pierre Geurts Louis Wehenkel

Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt ,ut ,rt ,xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the succe...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید