نتایج جستجو برای: passive critic features

تعداد نتایج: 593035  

2011
Philip S. Thomas

We present a novel class of actor-critic algorithms for actors consisting of sets of interacting modules. We present, analyze theoretically, and empirically evaluate an update rule for each module, which requires only local information: the module’s input, output, and the TD error broadcast by a critic. Such updates are necessary when computation of compatible features becomes prohibitively dif...

2008
Francisco S. Melo Manuel Lopes

In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, tted natural actor-critic (FNAC), that extends the work in [1] to allow for general function approximation and data reuse. We combine the natural actor-critic architecture [1] with a variant of tted value iteration using importance sampling. The method thus obtained combines...

1990
Ronald J. Williams Leemon C. Baird

Combining elements of the theory of dynamic programming with features appropriate for on-line learning has led to an approach Watkins has called incre-mental dynamic programming. Here we adopt this incremental dynamic programming point of view and obtain some preliminary mathematical results relevant to understanding the capabilities and limitations of actor-critic learning systems. Examples of...

1993
Eugene H. Spafford Chonchanok Viravan

A debugging oracle is a decisionmaker during a debugging process. Threemajor decisions during typical debugging sessions are on the identities, the locations, and the repairs of faults. A programmer usually acts as a debugging oracle. Our research objective is to help him in his decision-making process with a debugging oracle assistant. To enhance our understanding of both the debugging oracle ...

1997
Danil V. Prokhorov Lee A. Feldkamp

We propose a simple framework for critic-based training of recurrent neural networks and feedback controllers. We term the critics that are used primitive adaptive critics, since we represent them with the simplest possible architecture (bias weight only). We derive this framework from two main premises. The first of these is a natural similarity between a form of approximate dynamic programmin...

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه شهید چمران اهواز - دانشکده ادبیات و علوم انسانی 1389

this study purported to compare and contrast the use of self-mention and evidentials as two mtadiscourse features in opinion columns of persian and english newspapers. the theoretical basis of this study is the idea that metadiscourse features vary across cultural boundaries. for this purpose, 150 persian and 150 english opinion columns were collected based on three factors of topic, audience a...

1996
JING PENG RONALD J. WILLIAMS

This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic programming-based reinforcement learning method, with the TD() return estimation process, which is typically used in actor-critic learning, another well-known dynamic programming-based reinforcement learning method. The parameter is used to distribute credit throughout sequences of actions, leading ...

1999
F. L. Lewis

Two feedback control systems are designed that employ the adaptive critic architecture, which consists of two neural networks, one of which (the critic) tunes the other. The first application is a deadzone compensator, where it is shown that the adaptive critic structure is a natural consequence of the mathematical problem of inversion of an unknown function. In this situation the adaptive crit...

2011
Victor Gabillon Alessandro Lazaric Mohammad Ghavamzadeh Bruno Scherrer

In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید