نتایج جستجو برای: policy iterations

تعداد نتایج: 276392  

2012
Asma Al-tamimi Murad Abu-Khalaf Frank Lewis

This chapter is concerned with the application of approximate dynamic programming techniques (ADP) to solve for the value function, and hence the optimal control policy, in discrete-time nonlinear optimal control problems having continuous state and action spaces. ADP is a reinforcement learning approach (Sutton & Barto, 1998) based on adaptive critics (Barto et al., 1983), (Widrow et al., 1973...

2016
Shivaram Kalyanakrishnan Neeldhara Misra Aditya Gopalan

Policy Iteration (PI) (Howard 1960) is a classical method for computing an optimal policy for a finite Markov Decision Problem (MDP). The method is conceptually simple: starting from some initial policy, “policy improvement” is repeatedly performed to obtain progressively dominating policies, until eventually, an optimal policy is reached. Being remarkably efficient in practice, PI is often fav...

2011
Lucian Buşoniu Alessandro Lazaric Mohammad Ghavamzadeh Rémi Munos Robert Babuška Bart De Schutter

Approximate reinforcement learning deals with the essential problem of applying reinforcement learning in large and continuous state-action spaces, by using function approximators to represent the solution. This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning. We discuss three techniques for solving the core, po...

2013
Bruno Scherrer

Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γ-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal advan...

Journal: :Automatica 2009
Junyu Zhang Xi-Ren Cao

In this paper, we study the nth-bias optimality problem for finite continuous-time Markov decision processes (MDPs) with a multichain structure. We first provide nth-bias difference formulas for two policies and present some interesting characterizations of an nth-bias optimal policy by using these difference formulas. Then, we prove the existence of an nth-bias optimal policy by using nth-bias...

Journal: :Annals OR 2013
Huizhen Yu Dimitri P. Bertsekas

We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in Bertsekas and Yu (Math. Oper. Res. 37(1...

2010
Amir Massoud Farahmand Rémi Munos Csaba Szepesvári

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Niko...

2015
Devesh K. Jha Minghui Zhu

This paper presents the problem of distributed feedback motion planning for multiple robots. The problem of feedback multi-robot motion planning is formulated as a differential noncooperative game. We leverage the existing sampling-based algorithms and value iterations to develop an incremental policy synthesizer. The proposed algorithm makes use of an iterative best response algorithm to incre...

2013
Bruno Scherrer

Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γ-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal advan...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید