Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling
نویسندگان
چکیده
Despite the wide applications of Adam in reinforcement learning (RL), theoretical convergence Adam-type RL algorithms has not been established. This paper provides first such analysis for two fundamental policy gradient (PG) and temporal difference (TD) that incorporate AMSGrad updates (a standard alternative analysis), referred to as PG-AMSGrad TD-AMSGrad, respectively. Moreover, our focuses on Markovian sampling both algorithms. We show under general nonlinear function approximation, with a constant stepsize converges neighborhood stationary point at rate O(1/T) (where T denotes number iterations), diminishing exactly O(log^2 T/√T). Furthermore, linear TD-AMSGrad global optimum O(1/T), O(log Our study develops new techniques analyzing sampling.
منابع مشابه
Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning
To solve partially observable Markov decision problems, we introduce HQ-learning, a hierarchical extension of Q-learning. HQ-learning is based on an ordered sequence of subagents, each learning to identify and solve a Markovian subtask of the total task. Each agent learns (1) an appropriate subgoal (though there is no intermediate, external reinforcement for \good" subgoals), and (2) a Markovia...
متن کاملNon-Markovian State Aggregation for Reinforcement Learning
3 Feature Reinforcement Learning 5 3.1 Feature Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.2 State Aggregation and φ-uniformity . . . . . . . . . . . . . . . . . 6 3.2 Counterexamples to Open Problem 10 for V ∗ Aggregation . . . . . . . . . 8 3.2.1 Transient counterexample . . ....
متن کاملReinforcement Learning in Markovian and Non-Markovian Environments
This work addresses three problems with reinforcement learning and adap-tive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization. 3. Vector-valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running networks which may learn in parallel. ...
متن کاملAsymptotic Convergence Properties of EM Type Algorithms
We analyze the asymptotic convergence properties of a general class of EM type algorithms for es timating an unknown parameter via alternating estimation and maximization As examples this class includes ML EM penalized ML EM Green s OSL EM and many other approximate EM al gorithms A theorem is given which provides conditions for monotone convergence with respect to a given norm and speci es an ...
متن کاملConvergence of reinforcement learning algorithms and acceleration of learning.
The techniques of reinforcement learning have been gaining increasing popularity recently. However, the question of their convergence rate is still open. We consider the problem of choosing the learning steps alpha(n), and their relation with discount gamma and exploration degree epsilon. Appropriate choices of these parameters may drastically influence the convergence rate of the techniques. F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i12.17252