نتایج جستجو برای: وزن دهی critic

تعداد نتایج: 69016  

Journal: :Neurocomputing 2012
Haibo He Zhen Ni Jian Fu

In this paper, we propose a novel adaptive dynamic programming (ADP) architecture with three networks, an action network, a critic network, and a reference network, to develop internal goalrepresentation for online learning and optimization. Unlike the traditional ADP design normally with an action network and a critic network, our approach integrates the third network, a reference network, int...

Journal: :Soft Comput. 2013
Dongbin Zhao Bin Wang Derong Liu

A novel supervised Actor–Critic (SAC) approach for adaptive cruise control (ACC) problem is proposed in this paper. The key elements required by the SAC algorithm namely Actor and Critic, are approximated by feed-forward neural networks respectively. The output of Actor and the state are input to Critic to approximate the performance index function. A Lyapunov stability analysis approach has be...

Journal: :CoRR 2017
Zhewei Huang Shuchang Zhou BoEr Zhuang Xinyu Zhou

We introduce an Actor-Critic Ensemble(ACE) method for improving the performance of Deep Deterministic Policy Gradient(DDPG) algorithm1. At inference time, our method uses a critic ensemble to select the best action from proposals of multiple actors running in parallel. By having a larger candidate set, our method can avoid actions that have fatal consequences, while staying deterministic. Using...

Journal: :CoRR 2017
Ivo Danihelka Balaji Lakshminarayanan Benigno Uria Daan Wierstra Peter Dayan

We train a generator by maximum likelihood and we also train the same generator architecture by Wasserstein GAN. We then compare the generated samples, exact log-probability densities and approximate Wasserstein distances. We show that an independent critic trained to approximate Wasserstein distance between the validation set and the generator distribution helps detect overfitting. Finally, we...

ژورنال: :فصلنامه علمی پژوهشی باغ نظر 2011
سید عبالهادی دانشپور ایمان رئیسی

این تحقیق با استفاده از روش تطبیقی انجام شده است. ابتدا معنا و تعریف واژه ی criticرا، با استفاده از 4 فرهنگ لغات انگلیسی شناخته شده (وبستر1، آکسفورد2، لانگ من3 و امریکن هریتیج4) استخراج نموده و ضمن مقایسه ی معانی با هم، واژگان مترادف مورد استفاده در هر فرهنگ به دست آمده است. سپس با توجه به میزان فراوانی هر واژه، پنج واژه ی analyse، judge، evaluate، appraise، assess از میان واژگان انتخاب شده و ت...

2014
Abhijit Gosavi

Actor-critic algorithms are amongst the most well-studied reinforcement learning algorithms that can be used to solve Markov decision processes (MDPs) via simulation. Unfortunately, the parameters of the so-called “actor” in the classical actor-critic algorithm exhibit great volatility — getting unbounded in practice, whence they have to be artificially constrained to obtain solutions in practi...

Journal: :CoRR 2017
Flood Sung Li Zhang Tao Xiang Timothy M. Hospedales Yongxin Yang

We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For sup...

1990
Ronald J. Williams Leemon C. Baird

Combining elements of the theory of dynamic programming with features appropriate for on-line learning has led to an approach Watkins has called incre-mental dynamic programming. Here we adopt this incremental dynamic programming point of view and obtain some preliminary mathematical results relevant to understanding the capabilities and limitations of actor-critic learning systems. Examples of...

1999
David J. Miller Lian Yan

We develop new rules for combining estimates obtained from each classi er in an ensemble. A variety of combination techniques have been previously suggested, including averaging probability estimates, as well as hard voting schemes. We introduce a critic associated with each classi er, whose objective is to predict the classi er's errors. Since the critic only tackles a two-class problem, its p...

2002
Ahad Harati Majid Nili Ahmadabadi

Traditionally in many multiagent reinforcement learning researches, qualifying each individual agent’s behavior is responsibility of environment’s critic. However, in most practical cases, critic is not completely aware of effects of all agents’ actions on the team performance. Using agents’ learning history, it is possible to judge the correctness of their actions. To do so, we use team common...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید