نتایج جستجو برای: q policy
تعداد نتایج: 381585 فیلتر نتایج به سال:
We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in Bertsekas and Yu (Math. Oper. Res. 37(1...
We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided ...
Some game theory approaches to solve multiagent reinforcement learning in self play, i.e. when agents use the same algorithm for choosing action, employ equilibriums, such as the Nash equilibrium, to compute the policies of the agents. These approaches have been applied only on simple examples. In this paper, we present an extended version of Nash Q-Learning using the Stackelberg equilibrium to...
In this paper we apply reinforcement learning techniques to traffic light policies with the aim of increasing traffic flow through intersections. We model intersections with states, actions, and rewards, then use an industry-standard software platform to simulate and evaluate different policies against them. We compare various policies including fixed cycles, longest queue first (LQF), and the ...
We examine the effectiveness of the conventional (Q, r) model in managing production-inventory systems with finite capacity, stochastic demand, and stochastic order processing times. We show that, for systems with finite production capacity, order replenishment lead times are highly sensitive to loading and order quantity. Consequently, the choice of optimal order quantity and optimal reorder p...
Autonomous driving at intersections with traffic lights and stop signs can be handled by simple rules, however unsignalized intersections remain a challenging problem. We explore the effectiveness of using Deep Q Networks to handle such problems. Combining several recent advances in Deep RL, were we able to learn policies that surpass the performance of a commonly-used rule based approach in se...
In this paper we consider a two-level inventory system with one warehouse and one retailer with information exchange. Transportation times are constant and retailer faces independent Poisson demand. The retailer applies continuous review (R,Q)-policy. The supplier starts with m initial batches (of size Q), and places an order to an outside source immediately after the retailer’s inventory posit...
One of the major difficulties in applying Q-learning to realworld domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly ...
This paper looks at the convergence property of off-policy Monte Carlo agents with variable behaviour policies. It presents results about convergence and lack of convergence. Even if the agent generates every possible episode history infinitely often, the algorithm can fail to converge on the correct Q-values. On the other hand, it can converge on the correct Q-values under certain conditions. ...
Q methodology is seldom used by academics and practitioners in the field of administrative ethics, but it has important potential for empirical studies. Q offers a procedure and conceptual framework with which to study subjectivity in the social context. It has the advantage of bringing marginalized viewpoints to the fore but also has some drawbacks. The appendix provides a basic introduction t...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید