On the Rate of Convergence and Error Bounds for LSTD(\(\lambda\))
نویسندگان
چکیده
We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ ∈ (0, 1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ = 0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λ with respect to the approximation quality of the space and the number of samples.
منابع مشابه
Rate of Convergence and Error Bounds for LSTD($\lambda$)
We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ ∈ (0, 1), a high-probability estimate of the rate of convergence of this algorithm to its limit. W...
متن کاملApproximate Policy Iteration: A Survey and Some New Methods
We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced polic...
متن کاملConvergence analysis of the global FOM and GMRES methods for solving matrix equations $AXB=C$ with SPD coefficients
In this paper, we study convergence behavior of the global FOM (Gl-FOM) and global GMRES (Gl-GMRES) methods for solving the matrix equation $AXB=C$ where $A$ and $B$ are symmetric positive definite (SPD). We present some new theoretical results of these methods such as computable exact expressions and upper bounds for the norm of the error and residual. In particular, the obtained upper...
متن کاملOn the Spaces of $lambda _{r}$-almost Convergent and $lambda _{r}$-almost Bounded Sequences
The aim of the present work is to introduce the concept of $lambda _{r}$-almost convergence of sequences. We define the spaces $fleft( lambda _{r}right) $ and $f_{0}left( lambda _{r}right) $ of $ lambda _{r}$-almost convergent and $lambda _{r}$-almost null sequences. We investigate some inclusion relations concerning those spaces with examples and we determine the $beta $- and $gamma $-duals of...
متن کاملCystoscopy Image Classication Using Deep Convolutional Neural Networks
In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015