On the Rate of Convergence and Error Bounds for LSTD(\(\lambda\))

نویسندگان

  • Manel Tagorti
  • Bruno Scherrer
چکیده

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rate of Convergence and Error Bounds for LSTD($\lambda$)

We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ ∈ (0, 1), a high-probability estimate of the rate of convergence of this algorithm to its limit. W...

متن کامل

Approximate Policy Iteration: A Survey and Some New Methods

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced polic...

متن کامل

Convergence analysis of the global FOM and GMRES methods for solving matrix equations $AXB=C$ with SPD coefficients

In this paper‎, ‎we study convergence behavior of the global FOM (Gl-FOM) and global GMRES (Gl-GMRES) methods for solving the matrix equation $AXB=C$ where $A$ and $B$ are symmetric positive definite (SPD)‎. ‎We present some new theoretical results of these methods such as computable exact expressions and upper bounds for the norm of the error and residual‎. ‎In particular‎, ‎the obtained upper...

متن کامل

On the Spaces of $lambda _{r}$-almost Convergent and $lambda _{r}$-almost Bounded Sequences

The aim of the present work is to introduce the concept of $lambda _{r}$-almost convergence of sequences. We define the spaces $fleft( lambda _{r}right) $ and $f_{0}left( lambda _{r}right) $ of $ lambda _{r}$-almost convergent and $lambda _{r}$-almost null sequences. We investigate some inclusion relations concerning those spaces with examples and we determine the $beta $- and $gamma $-duals of...

متن کامل

Cystoscopy Image Classication Using Deep Convolutional Neural Networks

In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015