نتایج جستجو برای: وزن دهی critic

تعداد نتایج: 69016  

2018
Piji Li Lidong Bing Wai Lam

We present a training framework for neural abstractive summarization based on actor-critic approaches from reinforcement learning. In the traditional neural network based methods, the objective is only to maximize the likelihood of the predicted summaries, no other assessment constraints are considered, which may generate low-quality summaries or even incorrect sentences. To alleviate this prob...

Journal: :CoRR 2017
Yuhuai Wu Elman Mansimov Shun Liao Roger B. Grosse Jimmy Ba

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronec...

2017
Pierre-Luc Bacon Jean Harb Doina Precup

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new ...

2010
William Dabney Andrew G. Barto

In this paper, we address the critic optimization problem within the context of reinforcement learning. The focus of this problem is on improving an agent’s critic, so as to increase performance over a distribution of tasks. We use ordered derivatives, in a process similar to back propagation through time, to compute the gradient of an agent’s fitness with respect to its reward function. With e...

Journal: :Automatica 2009
Shalabh Bhatnagar Richard S. Sutton Mohammad Ghavamzadeh Mark Lee

We present four new reinforcement learning algorithms based on actor–critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor–critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochasti...

2008
Francisco S. Melo Manuel Lopes

In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, tted natural actor-critic (FNAC), that extends the work in [1] to allow for general function approximation and data reuse. We combine the natural actor-critic architecture [1] with a variant of tted value iteration using importance sampling. The method thus obtained combines...

Journal: :Neural networks : the official journal of the International Neural Network Society 2012
Feng Liu Jian Sun Jennie Si Wentao Guo Shengwei Mei

Approximate/adaptive dynamic programming (ADP) has been studied extensively in recent years for its potential scalability to solve large state and control space problems, including those involving continuous states and continuous controls. The applicability of ADP algorithms, especially the adaptive critic designs has been demonstrated in several case studies. Direct heuristic dynamic programmi...

2010
Angustae Vitae

The study of intertextuality, the shaping of a text’s meaning by other texts, remains a laborious process for the literary critic. Kristeva (Kristeva, 1986) suggests that "Any text is constructed as a mosaic of quotations; any text is the absorption and transformation of another.& The nature of these mosaics is widely varied, from direct quotations representing a simple and overt intertextualit...

2003
Stephen Shervais Thaddeus T. Shannon George G. Lendaris

This work supported in part by the National Science Foundation under grant ECS-9904378. Abstract Adaptive critic based approximate dynamic programming techniques are gradient based methods for finding optimal policies for multi-stage decision processes. We believe adaptive critic methods are now developed to the point that they can be applied to the full spectrum of decision and control problem...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید