passive critic features

Policy Search in Reproducing Kernel Hilbert Space

2016

Ngo Anh Vien Peter Englert Marc Toussaint

Modeling policies in reproducing kernel Hilbert space (RKHS) renders policy gradient reinforcement learning algorithms non-parametric. As a result, the policies become very flexible and have a rich representational potential without a predefined set of features. However, their performances might be either non-covariant under reparameterization of the chosen kernel, or very sensitive to step-siz...

متن کامل

Beyond Adaptive Critic - Creative Learning for Intelligent Autonomous Mobile Robots

2002

XIAOQUN LIAO

Intelligent industrial and mobile robots may be considered proven technology in structured environments. Teach programming and supervised learning methods permit solutions to a variety of applications. However, we believe that to extend the operation of these machines to more unstructured environments requires a new learning method. Both unsupervised learning and reinforcement learning are pote...

متن کامل

NATIONAL UNIVERSITY OF SINGAPORE School of Computing PH.D DEFENCE - PUBLIC SEMINAR Title: Using Meta-Data from Free-Text User-Generated Content to Improve Personalized Recommendation by Reducing Sparsity Speaker: Mr

2015

Xu Xiaoying

Recommender Systems (RS) have become increasingly essential in many domains for alleviating the "information overload" problem, but existing recommendation techniques suffer from the sparsity problem due to insufficient input data. In this thesis, we aim at extracting and incorporating meta-data from free-text UserGenerated Content (UGC) to lessen the effects of sparsity and therefore improve t...

متن کامل

Boosting the Actor with Dual Critic

Journal: :CoRR 2017

Bo Dai Albert Shaw Niao He Lihong Li Le Song

This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC. It is derived in a principled way from the Lagrangian dual form of the Bellman optimality equation, which can be viewed as a two-player game between the actor and a critic-like function, which is named as dual critic. Compared to its actor-critic relatives, Dual-AC has the desired property that the actor...

متن کامل

SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation

Journal: :CoRR 2017

Yuan Xue Tao Xu Han Zhang L. Rodney Long Xiaolei Huang

Inspired by classic generative adversarial networks (GAN), we propose a novel end-to-end adversarial neural network, called SegAN, for the task of medical image segmentation. Since image segmentation requires dense, pixel-level labeling, the single scalar real/fake output of a classic GAN’s discriminator may be ineffective in producing stable and sufficient gradient feedback to the networks. In...

متن کامل

Extensions to a Generalization Critic for Inductive Proof

1996

Andrew Ireland Alan Bundy

In earlier papers a critic for automatically generalizing conjectures in the context of failed inductive proofs was presented. The critic exploits the partial success of the search control heuristic known as rippling. Through empirical testing a natural generalization and extension of the basic critic emerged. Here we describe our extended generalization critic together with some promising expe...

متن کامل

An Actor/Critic Algorithm that is Equivalent to Q-Learning

1994

Robert H. Crites Andrew G. Barto

We prove the convergence of an actor/critic algorithm that is equivalent to Q-learning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using c...

متن کامل

G Uide a Ctor - C Ritic for C Ontinuous C Ontrol

2018

Abbas Abdolmaleki Masashi Sugiyama

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GA...

متن کامل

On Passive Quadrupedal Bounding with Flexible Linear Torso

Journal: International Journal of Robotics 2015

Evangelos Papadopoulos, Konstantinos Koutsoukis

This paper studies the effect of flexible linear torso on the dynamics of passive quadruped bounding. A reduced-order passive and conservative model with linear flexible torso and springy legs is introduced. The model features extensive spine deformation during high-speed bounding, resembling those observed in a cheetah. Fixed points corresponding to cyclic bounding motions are found and calcul...

متن کامل

A Convergent Online Single Time Scale Actor Critic Algorithm

Journal: :Journal of Machine Learning Research 2010

Dotan Di Castro Ron Meir

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local m...

متن کامل