نتایج جستجو برای: q policy

تعداد نتایج: 381585  

2014
Shalabh Bhatnagar

We present in this article a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm is convergent. Numerical results on a multi-stage stochastic shortest path problem show t...

2006
Raja Afandi Jianqing Zhang Carl A. Gunter

There are many problems hindering the design and development of Service-Oriented Architectures (SOAs), which can dynamically discover and compose multiple services so that the quality of the composite service is measured by its End-to-End (E2E) quality, rather than that of individual services in isolation. The diversity and complexity of QoS constraints further limit the widescale adoption of Q...

1999
Gang ZHAO Ruoying SUN

Reinforcement learning is an efficient method for solving Markov Decision Processes that an agent improves its performance by using scalar reward values with higher capability of reactive and adaptive behaviors. Q-learning is a representative reinforcement learning method which is guaranteed to obtain an optimal policy but needs numerous trials to achieve it. k-Certainty Exploration Learning Sy...

Journal: :J. Inf. Sci. Eng. 2014
Yuan-Pao Hsu Wei-Cheng Jiang

In this paper, we present a rapid learning algorithm called Dyna-QPC. The proposed algorithm requires considerably less training time than Q-learning and Table-based Dyna-Q algorithm, making it applicable to real-world control tasks. The Dyna-QPC algorithm is a combination of existing learning techniques: CMAC, Q-learning, and prioritized sweeping. In a practical experiment, the Dyna-QPC algori...

2014
Duc Thien Nguyen William Yeoh Hoong Chuin Lau Shlomo Zilberstein Chongjie Zhang

In this document, we show the proofs for the theoretical results described in the paper titled “Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs” submitted to AAAI 2014. In this paper, we consider MDPs where a joint state can transition to any other joint state with non-zero probability, that is, the MDP is unichain. We are going to show the decomposability of th...

2009
N. Yazdan Shenas A. Eshraghniaye Jahromi M. Modarres Yazdi

In this paper, a continuous review inventory system is considered in which an order in a batch of size Q is placed immediately after the inventory position reaches R. Transportation time is constant and demands are assumed to be generated by a stationary Poisson process with one unit demand at a time. Demands not covered immediately from the inventory are backordered. In a recent paper, the exa...

2001
Gregory Z. Grudic Lyle H. Ungar

We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...

Journal: :Annals of statistics 2012
Yair Goldberg Michael R Kosorok

We develop methodology for a multistage-decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the ...

2000
Junichiro Yoshimoto Shin Ishii Masa-aki Sato

In this article, we propose a new reinforcement learning (RL) method for a system having continuous state and action spaces. Our RL method has an architecture like the actorcritic model. The critic tries to approximate the Q-function, which is the expected future return for the current state-action pair. The actor tries to approximate a stochastic soft-max policy defined by the Q-function. The ...

Journal: :international journal of industrial engineering and productional research- 0
rasoul haji rasoul haji, department of industrial engineering, sharif university of technology, tehran, iran mohammadmohsen moarefdoost department of industrial engineering, sharif university of technology, tehran, iran seyed babak ebrahimi department of industrial engineering, iran university of science & technology, tehran, iran

this paper aims to evaluate inventory cost of a two-echelon serial supply chain system under vendor managed inventory program with stochastic demand, and examine the effect of environmental factors on the cost of overall system. for this purpose, we consider a two-echelon serial supply chain with a manufacturer and a retailer. under vendor managed inventory program, the decision on inventory le...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید