passive critic features

نتایج جستجو برای: passive critic features

تعداد نتایج: 593035 فیلتر نتایج به سال:

Dynamic Control with Actor-Critic Reinforcement Learning

2009

Reinaldo A Uribe

4 Actor-Critic Marble Control 4 4.1 R-code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 The critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Unstable actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.4 Trading off stability against...

متن کامل

Evaluating Operational Features of Three Unconventional Intersections under Heavy Traffic Based on CRITIC Method

Journal: :Sustainability 2021

Conventional four-legged intersections are inefficient under heavy traffic requirements and prone to congestion problems. Unconventional with innovative designs allow for more efficient operations can increase the capacity of intersection, in some cases. Common unconventional include upstream signalized crossover intersection (USC), continuous flow (CFI), parallel (PFI). At present, an increasi...

متن کامل

State-constrained agile missile control with adaptive-critic-based neural networks

Journal: :IEEE Trans. Contr. Sys. Techn. 2002

Dongchen Han S. N. Balakrishnan

In this study, we develop an adaptive-critic-based controller to steer an agile missile that has a constraint on the minimum flight Mach number from various initial Mach numbers to a given final Mach number in minimum time while completely reversing its flightpath angle. This class of bounded state space, free final time problems is very difficult to solve due to discontinuities in costates at ...

متن کامل

A Divergence Critic for Inductive Proof

Journal: :J. Artif. Intell. Res. 1996

Toby Walsh

Inductive theorem provers often diverge. This paper describes a simple critic, a computer program which monitors the construction of inductive proofs attempting to identify diverging proof attempts. Divergence is recognized by means of a \diierence matching" procedure. The critic then proposes lemmas and generalizations which \ripple" these differences away so that the proof can go through with...

متن کامل

Simulating Artist and Critic Dynamics - An Agent-based Application of an Evolutionary Art System

2009

Gary Greenfield Penousal Machado

We describe an agent based artist-critic simulation. Artist agents use a swarm based evolutionary art system to evolve images that try to match their preferences. Preferred images are submitted to critic agents who then decide, accordingly to their own criteria, which images should be displayed in a public gallery. The purpose of our model is to enable the implementation of a variety of behavio...

متن کامل

A Critic Criticised

Journal: :Nature 1897

متن کامل

Freeway Merging in Congested Traffic based on Multipolicy Decision Making with Passive Actor Critic

Journal: :CoRR 2017

Tomoki Nishi Prashant Doshi Danil V. Prokhorov

Freeway merging in congested traffic is a significant challenge toward fully automated driving. Merging vehicles need to decide not only how to merge into a spot, but also where to merge. We present a method for the freeway merging based on multi-policy decision making with a reinforcement learning method called passive actorcritic (pAC), which learns with less knowledge of the system and witho...

متن کامل

Stochastic Control Strategies and Adaptive Critic Methods

2008

Randa Herzallah David Lowe

Adaptive critic methods have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, nonlinear and nonstationary environments. In this study, a novel probabilistic dual heuristic programming (DHP) based adaptive critic controller is proposed...

متن کامل

Robust Contextual Bandit via the Capped-$\ell_{2}$ norm

Journal: :CoRR 2017

Feiyun Zhu Xinliang Zhu Sheng Wang Jiawen Yao Junzhou Huang

This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decisionmaking methods in mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue...

متن کامل

An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

1998

Hajime Kimura Shigenobu Kobayashi

We present an analysis of actor/critic algorithms, in which the actor updates its policy using eligibility traces of the policy parameters. Most of the theoretical results for eligibility traces have been for only critic's value iteration algorithms. This paper investigates what the actor's eligibility trace does. The results show that the algorithm is an extension of Williams' REINFORCE algori...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید