Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

نویسندگان

چکیده

Despite the wide applications of Adam in reinforcement learning (RL), theoretical convergence Adam-type RL algorithms has not been established. This paper provides first such analysis for two fundamental policy gradient (PG) and temporal difference (TD) that incorporate AMSGrad updates (a standard alternative analysis), referred to as PG-AMSGrad TD-AMSGrad, respectively. Moreover, our focuses on Markovian sampling both algorithms. We show under general nonlinear function approximation, with a constant stepsize converges neighborhood stationary point at rate O(1/T) (where T denotes number iterations), diminishing exactly O(log^2 T/√T). Furthermore, linear TD-AMSGrad global optimum O(1/T), O(log Our study develops new techniques analyzing sampling.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning

To solve partially observable Markov decision problems, we introduce HQ-learning, a hierarchical extension of Q-learning. HQ-learning is based on an ordered sequence of subagents, each learning to identify and solve a Markovian subtask of the total task. Each agent learns (1) an appropriate subgoal (though there is no intermediate, external reinforcement for \good" subgoals), and (2) a Markovia...

متن کامل

Non-Markovian State Aggregation for Reinforcement Learning

3 Feature Reinforcement Learning 5 3.1 Feature Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.2 State Aggregation and φ-uniformity . . . . . . . . . . . . . . . . . 6 3.2 Counterexamples to Open Problem 10 for V ∗ Aggregation . . . . . . . . . 8 3.2.1 Transient counterexample . . ....

متن کامل

Reinforcement Learning in Markovian and Non-Markovian Environments

This work addresses three problems with reinforcement learning and adap-tive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization. 3. Vector-valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running networks which may learn in parallel. ...

متن کامل

Asymptotic Convergence Properties of EM Type Algorithms

We analyze the asymptotic convergence properties of a general class of EM type algorithms for es timating an unknown parameter via alternating estimation and maximization As examples this class includes ML EM penalized ML EM Green s OSL EM and many other approximate EM al gorithms A theorem is given which provides conditions for monotone convergence with respect to a given norm and speci es an ...

متن کامل

Convergence of reinforcement learning algorithms and acceleration of learning.

The techniques of reinforcement learning have been gaining increasing popularity recently. However, the question of their convergence rate is still open. We consider the problem of choosing the learning steps alpha(n), and their relation with discount gamma and exploration degree epsilon. Appropriate choices of these parameters may drastically influence the convergence rate of the techniques. F...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i12.17252