نتایج جستجو برای: وزن دهی critic

تعداد نتایج: 69016  

Journal: :CoRR 2017
Andrew Levy Robert Platt Kate Saenko

The ability to learn at different resolutions in time may help overcome one of the main challenges in deep reinforcement learning — sample efficiency. Hierarchical agents that operate at different levels of temporal abstraction can learn tasks more quickly because they can divide the work of learning behaviors among multiple policies and can also explore the environment at a higher level. In th...

Journal: :CoRR 2017
Miao Liu Marlos C. Machado Gerald Tesauro Murray Campbell

Eigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration Machado et al. [2017a]. Despite its first initial promising results, a couple of issues in current algorithms limit its application, namely: 1) EO methods require two separate steps (eigenoption discovery and r...

2013
Philip S. Thomas William Dabney Stephen Giguere Sridhar Mahadevan

Natural actor-critics form a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability—their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of re...

Journal: :CoRR 2012
Thomas Degris Martha White Richard S. Sutton

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in offpolicy gradient temporal-difference learning....

Journal: :CoRR 2017
Kavosh Asadi Cameron Allen Melrose Roderick Abdel-rahman Mohamed George Konidaris Michael L. Littman

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the n...

Journal: :CoRR 2017
Chang Xu Tao Qin Gang Wang Tie-Yan Liu

Stochastic gradient descent (SGD), which updates the model parameters by adding a local gradient times a learning rate at each step, is widely used in model training of machine learning algorithms such as neural networks. It is observed that the models trained by SGD are sensitive to learning rates and good learning rates are problem specific. We propose an algorithm to automatically learn lear...

Journal: :IEEE Trans. Knowl. Data Eng. 1997
Andy Hon Wai Chun Edmund Ming-Kit Lai

This paper describes an intelligent computer-aided architectural design system (ICAAD) called ICADS. ICADS encapsulates different types of design knowledge into independent “critic” modules. Each “critic” module possesses expertise in evaluating an architect’s work in different areas of architectural design and can offer expert advice when needed. This research focuses on the representation of ...

2003
K. KrishnaKumar G. Limes K. Gundy-Burlet D. Bryant

Neural networks have been successfully used for implementing control architectures for different applications. In this work, we examine a neural network augmented adaptive critic as a Level 2 intelligent controller for a C-17 aircraft. This intelligent control architecture utilizes an adaptive critic to tune the parameters of a reference model, which is then used to define the angular rate comm...

Journal: :Canadian Medical Association Journal 2009

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید