وزن دهی critic

نتایج جستجو برای: وزن دهی critic

تعداد نتایج: 69016 فیلتر نتایج به سال:

Classification-based Policy Iteration with a Critic

2011

Victor Gabillon Alessandro Lazaric Mohammad Ghavamzadeh Bruno Scherrer

In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a ...

متن کامل

OnActor-Critic Algorithms

Journal: :SIAM J. Control and Optimization 2003

Vijay R. Konda John N. Tsitsiklis

In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a su...

متن کامل

Beyond Adaptive Critic - Creative Learning for Intelligent Autonomous Mobile Robots

2002

XIAOQUN LIAO

Intelligent industrial and mobile robots may be considered proven technology in structured environments. Teach programming and supervised learning methods permit solutions to a variety of applications. However, we believe that to extend the operation of these machines to more unstructured environments requires a new learning method. Both unsupervised learning and reinforcement learning are pote...

متن کامل

Boosting the Actor with Dual Critic

Journal: :CoRR 2017

Bo Dai Albert Shaw Niao He Lihong Li Le Song

This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC. It is derived in a principled way from the Lagrangian dual form of the Bellman optimality equation, which can be viewed as a two-player game between the actor and a critic-like function, which is named as dual critic. Compared to its actor-critic relatives, Dual-AC has the desired property that the actor...

متن کامل

Efficacy of conservation tillage systems on soil quality in wheat-based cropping systems in dryland agroecosystems

Journal: : 2022

سابقه و هدف: سلامت خاک از مولفه ‏های اصلی در دستیابی به سامانه ‏‏های کشاورزی پایدار بوده که شدت تحت تاثیر عملیات زراعی مانند خاکورزی قرار می ‏گیرد. را توان با استفاده پارامترهای فیزیکی، شیمیایی بیولوژیکی قالب الگوریتم‏ های مشخص کمّی کرد. نتیجه، بررسی وضعیت کیفی باروری مدیریتی مختلف زمین جهت استقرار مناسب برای تولید بهینه نظام‌های امری ضروری می‏ باشد. چارچوب ارزیابی مدیریت SMAF 1 عنوان ابزاری قدر...

متن کامل

Extensions to a Generalization Critic for Inductive Proof

1996

Andrew Ireland Alan Bundy

In earlier papers a critic for automatically generalizing conjectures in the context of failed inductive proofs was presented. The critic exploits the partial success of the search control heuristic known as rippling. Through empirical testing a natural generalization and extension of the basic critic emerged. Here we describe our extended generalization critic together with some promising expe...

متن کامل

An Actor/Critic Algorithm that is Equivalent to Q-Learning

1994

Robert H. Crites Andrew G. Barto

We prove the convergence of an actor/critic algorithm that is equivalent to Q-learning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using c...

متن کامل

تعیین و وزن دهی عوامل موثر بر مدیریت دانش از دیدگاه مدیران و خبرگان بر اساس مدل ahp در گروه صنعتی سایپا

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه صنعتی شاهرود 1389

مریم علی یی, بزرگمهر اشرفی, محمد موسوی شاهرودی,

سازمانها برای تحقق اهداف خود دارای منابع و داراییهای متعددی هستند. برخی از این منابع و دارایی ها بسیار ارزشمند، یگانه و منحصر به فرد هستند و برای کسب مزیت رقابتی نقش محوری دارند. دانش از جمله این موارد است بطوریکه دانش را جایگزین نهایی تولید، ثروت و سرمایه پولی می دانند. انتقال دانش در سطح سازمان هم بر سرعت و هم بر عملکرد سازمان اثر مثبت می گذارد. دانش چیز جدیدی نیست ولی پذیرش آن به عنوان سرمای...

15 صفحه اول

G Uide a Ctor - C Ritic for C Ontinuous C Ontrol

2018

Abbas Abdolmaleki Masashi Sugiyama

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GA...

متن کامل

A Convergent Online Single Time Scale Actor Critic Algorithm

Journal: :Journal of Machine Learning Research 2010

Dotan Di Castro Ron Meir

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local m...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید