A Planning Algorithm for Predictive State Representations

نویسندگان

  • Masoumeh T. Izadi
  • Doina Precup
چکیده

We address the problem of optimally controlling stochastic environments that are partially observable. The standard method for tackling such problems is to define and solve a Partially Observable Markov Decision Process (POMDP). However, it is well known that exactly solving POMDPs is very costly computationally. Recently, Littman, Sutton and Singh (2002) have proposed an alternative representation of partially observable environments, called predictive state representations (PSRs). PSRs are grounded in the sequence of actions and observations of the agent, and hence relate the state representation directly to the agent's experience. In this paper, we present a policy iteration algorithm for finding policies using PSRs. In preliminary experiments, our algorithm produced good solutions. 1 Predictive State Representation We assume that we are given a system consisting of a discrete, finite set of n states 5, a discrete finite set of actions A, and a discrete finite set of observations O. The interaction with the system takes place at discrete time intervals. The initial state of the system so is drawn from an initial probability distribution over states I. On every time step t, an action at is chosen according to some policy. Then the underlying state changes to and a next observation 0i+1 is generated. The system is Markovian, in the sense that for every action, the transition to the next state is generated according to a probability distribution described by an (n x n) transition matrix Similarly, for a given observation o and action a, the next observation is generated according to an (n x n) diagonal observation matrix where is the probability of observation o when action a is selected and state i is reached. Since we are interested in optimal control, rather than prediction, we also assume that there exists a set of reward vectors for each action a, where is the reward for taking action a in underlying state i PSRs are based on the notion of tests. A test is an ordered sequence of action-observation pairs q = The prediction for test q is the probability of the sequence of observations being generated, given the sequence of actions a1...ak. The prediction for a test q given prior history //-, denoted is the probability of seeing the sequence of observations of q after seeing history h and taking the sequence of actions specified by q. For any set of tests Q, its prediction vector is: A set of tests Q is a PSR if its prediction vector forms a sufficient statistic for the dynamical system, i.e., if all tests can be predicted based on p(Q|h). Of particular interest is the case of linear PSRs, in which there exists a projection vector mq for any test q such that Littman et al. also define an outcome function u mapping tests into n-dimensional vectors defined recursively by: and u(aoq) = where e represents a null test and cn is the (1 x n) vector of all Is. Each component u, (q) indicates the probability of the test q when its sequence of actions is applied from state st. A set of tests Q = = 1,2,..A;} is called linearly independent if the outcome vectors of its tests arc linearly independent. Using this definition, such a set Q can be found by a simple search algorithm in polynomial time, given the POMDP model of the environment. Littman, Sutton and Singh (2002) showed that the outcome vectors of the tests in Q can be linearly combined to produce the outcome vector for any test. 2 Policy evaluation using PSRs We assume that we are given a policy and that the initial start state of the system, is drawn according to the staring probability distribution I. If we consider a given horizon t, only a finite number of tests of length t are possible when starting from I. Let be this set of possible tests. The value of a memoryless policy with respect to a given start state distribution I is the expected return over all possible tests that can occur when the starting state is drawn form I and then behavior is generated according to policy where is the expected return for test q given that the initial state is drawn from / and policy is followed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Planning in Decentralized POMDPs with Predictive Policy Representations

We discuss the problem of policy representation in stochastic and partially observable systems, and address the case where the policy is a hidden parameter of the planning problem. We propose an adaptation of the Predictive State Representations (PSRs) to this problem by introducing tests (sequences of actions and observations) on policies. The new model, called the Predictive Policy Representa...

متن کامل

Planning in Models that Combine Memory with Predictive Representations of State

Models of dynamical systems based on predictive state representations (PSRs) use predictions of future observations as their representation of state. A main departure from traditional models such as partially observable Markov decision processes (POMDPs) is that the PSR-model state is composed entirely of observable quantities. PSRs have recently been extended to a class of models called memory...

متن کامل

Goal-Directed Online Learning of Predictive Models

We present an algorithmic approach for integrated learning and planning in predictive representations. The approach extends earlier work on predictive state representations to the case of online exploration, by allowing exploration of the domain to proceed in a goal-directed fashion and thus be more efficient. Our algorithm interleaves online learning of the models, with estimation of the value...

متن کامل

Closing the learning-planning loop with predictive state representations

A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they re...

متن کامل

Trip pattern of low-density residential area in semi urban industrial cluster: predictive modeling

This research elucidates the trip pattern of the low-density residential zone in a semi-urban industrial cluster of southwestern Nigeria. These sets of dwellers are often times neglected in the transportation planning process with the view that it is not a residential zone. Domiciliary information gathering procedure was employed in the analysis with 0.82 return rates. It was backed up with the...

متن کامل

Compressed Predictive States Efficient Learning and Planning with Compressed Predictive States

Predictive state representations (PSRs) offer an expressive framework for modelling partially observable systems. By compactly representing systems as functions of observable quantities, the PSR learning approach avoids using local-minima prone expectationmaximization and instead employs a globally optimal moment-based algorithm. Moreover, since PSRs do not require a predetermined latent state ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003