Robust Planning with (L)RTDP

نویسندگان

  • Olivier Buffet
  • Douglas Aberdeen
چکیده

Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using Real-Time Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the best policy under the worst model. This paper shows how RTDP can be made robust in the common case where transition probabilities are known to lie in a given interval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Planning with Robust (L)RTDP

Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using Real-Time Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the best policy under the worst model. This paper shows how RTDP can be made robust in the common case...

متن کامل

PAC Reinforcement Learning Bounds for RTDP and Rand-RTDP

Real-time Dynamic Programming (RTDP) is a popular algorithm for planning in a Markov Decision Process (MDP). It can also be viewed as a learning algorithm, where the agent improves the value function and policy while acting in an MDP. It has been empirically observed that an RTDP agent generally performs well when viewed this way, but past theoretical results have been limited to asymptotic con...

متن کامل

PAC Reinforcement Learning Bounds for RTDP and Rand-RTDP Technical Report

Real-time Dynamic Programming (RTDP) is a popular algorithm for planning in a Markov Decision Process (MDP). It can also be viewed as a learning algorithm, where the agent improves the value function and policy while acting in an MDP. It has been empirically observed that an RTDP agent generally performs well when viewed this way, but past theoretical results have been limited to asymptotic con...

متن کامل

Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming

RTDP is a recent heuristic-search DP algorithm for solving non-deterministic planning problems with full observability. In relation to other dynamic programming methods, RTDP has two benefits: first, it does not have to evaluate the entire state space in order to deliver an optimal policy, and second, it can often deliver good policies pretty fast. On the other hand, RTDP final convergence is s...

متن کامل

A Bounded Q-decomposition RDTP Approach to Resource Allocation

This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, two approaches are adapted and merged in an effective way: the Q-decomposition and the bounded Real-time Dynamic Programming (bounded rtdp). The Q-decomposition allows to coordinate reward separated agents and thus permits to reduce th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005