نتایج جستجو برای: q policy
تعداد نتایج: 381585 فیلتر نتایج به سال:
We present in this article a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm is convergent. Numerical results on a multi-stage stochastic shortest path problem show t...
There are many problems hindering the design and development of Service-Oriented Architectures (SOAs), which can dynamically discover and compose multiple services so that the quality of the composite service is measured by its End-to-End (E2E) quality, rather than that of individual services in isolation. The diversity and complexity of QoS constraints further limit the widescale adoption of Q...
Reinforcement learning is an efficient method for solving Markov Decision Processes that an agent improves its performance by using scalar reward values with higher capability of reactive and adaptive behaviors. Q-learning is a representative reinforcement learning method which is guaranteed to obtain an optimal policy but needs numerous trials to achieve it. k-Certainty Exploration Learning Sy...
In this paper, we present a rapid learning algorithm called Dyna-QPC. The proposed algorithm requires considerably less training time than Q-learning and Table-based Dyna-Q algorithm, making it applicable to real-world control tasks. The Dyna-QPC algorithm is a combination of existing learning techniques: CMAC, Q-learning, and prioritized sweeping. In a practical experiment, the Dyna-QPC algori...
In this document, we show the proofs for the theoretical results described in the paper titled “Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs” submitted to AAAI 2014. In this paper, we consider MDPs where a joint state can transition to any other joint state with non-zero probability, that is, the MDP is unichain. We are going to show the decomposability of th...
In this paper, a continuous review inventory system is considered in which an order in a batch of size Q is placed immediately after the inventory position reaches R. Transportation time is constant and demands are assumed to be generated by a stationary Poisson process with one unit demand at a time. Demands not covered immediately from the inventory are backordered. In a recent paper, the exa...
We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...
We develop methodology for a multistage-decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the ...
In this article, we propose a new reinforcement learning (RL) method for a system having continuous state and action spaces. Our RL method has an architecture like the actorcritic model. The critic tries to approximate the Q-function, which is the expected future return for the current state-action pair. The actor tries to approximate a stochastic soft-max policy defined by the Q-function. The ...
this paper aims to evaluate inventory cost of a two-echelon serial supply chain system under vendor managed inventory program with stochastic demand, and examine the effect of environmental factors on the cost of overall system. for this purpose, we consider a two-echelon serial supply chain with a manufacturer and a retailer. under vendor managed inventory program, the decision on inventory le...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید