نتایج جستجو برای: passive critic features

تعداد نتایج: 593035  

2017
Paul Ozkohen Jelle Visser Martijn van Otterlo Marco Wiering

Neural networks and reinforcement learning have successfully been applied to various games, such as Ms. Pacman and Go. We combine multilayer perceptrons and a class of reinforcement learning algorithms known as actor-critic to learn to play the arcade classic Donkey Kong. Two neural networks are used in this study: the actor and the critic. The actor learns to select the best action given the g...

1996
Adel Bouhoula Michael Rusinowitch Pierre Lescanne David Basin Alan Bundy Miki Hermann Andrew Ireland

Inductive theorem provers often diverge. This paper describes a simple critic, a computer program which monitors the construction of inductive proofs attempting to identify diverging proof attempts. Divergence is recognized by means of a \di erence matching" procedure. The critic then proposes lemmas and generalizations which \ripple" these differences away so that the proof can go through with...

Journal: :CoRR 2017
Baolin Peng Xiujun Li Jianfeng Gao Jingjing Liu Yun-Nung Chen Kam-Fai Wong

This paper presents a new method — adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in taskcompletion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the ...

2013
Roshan Shariff Travis Dick

We empirically investigate modifications and implementation techniques required to apply a policy-gradient actor-critic algorithm to reinforcement learning problems with continuous state and action spaces. As a test-bed, we introduce a new simulated task, which involves landing a lunar module in a simplified two-dimensional world. The empirical results demonstrate the importance of efficiently ...

Journal: :Automatica 2013
Shubhendu Bhasin R. Kamalapurkar Marcus Johnson Kyriakos G. Vamvoudakis Frank L. Lewis Warren E. Dixon

An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor–critic–identifier (ACI) is proposed to approximate the Hamilton–Jacobi–Bellman equation using three neural network (NN) structures—actor and critic NNs approximate the optimal control and the optimal value function,...

2017
Vivek Veeriah Harm van Seijen Richard S. Sutton

Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step λ-return with nonlinear function approximators. However, this was limited to action-value methods. In thi...

Journal: :CoRR 2017
Ali Shafiekhani Mohammad J. Mahjoob Mehdi Akraminia

Abstract: Fuzzy critic-based learning forms a reinforcement learning method based on dynamic programming. In this paper, an adaptive critic-based neuro-fuzzy system is presented for an unmanned bicycle. The only information available for the critic agent is the system feedback which is interpreted as the last action performed by the controller in the previous state. The signal produced by the c...

2011
RONALD B. MILLER MARC KESSLER MARION BAUER SANDRA HOWELL KENNETH KREILING

This paper briefly describes the proceedings of the Panel of Inquiry held May 13, 2008 at Saint Michael’s College on the case of “Anna" (Podetz, 2008, 2011). It summarizes the advocate's and critic's positions on four claims and one counter-claim. The five judges independently voted to accept all four of the advocate’s claims (by votes of 5-0 or 4-1), and rejected the critic's counterclaim by a...

Journal: :Expert Systems With Applications 2021

Portfolio management aims at maximizing the return on investment while minimizing risk by continuously reallocating assets forming portfolio. These are not independent but correlated during a short time period. A graph convolutional reinforcement learning framework called DeepPocket is proposed whose objective to exploit time-varying interrelations between financial instruments. represented nod...

2011
Kyriakos G. Vamvoudakis Draguna Vrabie Frank L. Lewis

In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Bellman equation and it does not require explicit knowledge on the system’...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید