2 Finite State and Action Mdps
نویسنده
چکیده
In this chapter we study Markov decision processes (MDPs) with nite state and action spaces. This is the classical theory developed since the end of the fties. We consider nite and in nite horizon models. For the nite horizon model the utility function of the total expected reward is commonly used. For the in nite horizon the utility function is less obvious. We consider several criteria: total discounted expected reward, average expected reward and more sensitive optimality criteria including the Blackwell optimality criterion. We end with a variety of other subjects. The emphasis is on computational methods to compute optimal policies for these criteria. These methods are based on concepts like value iteration, policy iteration and linear programming. This survey covers about three hundred papers. Although the subject of nite state and action MDPs is classical, there are still open problems. We also mention some of them.
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملNonlinear Policy Gradient Algorithms for Noise-Action MDPs
We develop a general theory of efficient policy gradient algorithms for Noise-Action MDPs (NMDPs), a class of MDPs that generalize Linearly Solvable MDPs (LMDPs). For finite horizon problems, these lead to simple update equations based on multiple rollouts of the system. We show that our policy gradient algorithms are faster than the PI algorithm, a state of the art policy optimization algorith...
متن کاملOnline Linear Regression and Its Application to Model-Based Reinforcement Learning
We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a model-based approach and show that a special type of online linear regression allows us to learn MDPs with (possibly kernalized) linearly parameterized dynamics. This result builds on Kearns and Singh’s work that provides ...
متن کاملAn Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning
We consider the problem of decision making in the context of unknown Markov decision processes with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic posterior sampling strategy based on the maximization of state-action value functions of MDPs sampled from the posterior. First experiments are promising. Introduction. The design of algorithm...
متن کاملStrong polynomiality of policy iterations for average-cost MDPs modeling replacement and maintenance problems
This note considers an average-cost Markov Decision Process (MDP) with finite state and action sets and satisfying the additional condition that there is a state towhich the system jumps fromany state andunder any action with a positive probability. The main result is that the policy iteration algorithm is strongly polynomial for such MDPs, which are often used to model replacement and maintena...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001