discrete action reinforcement learning automata darla

نتایج جستجو برای: discrete action reinforcement learning automata darla

تعداد نتایج: 1357117 فیلتر نتایج به سال:

Multiple response learning automata

Journal: :IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society 1996

Anastasios A. Economides

Learning Automata update their action probabilites on the basis of the response they get from a random environment. They use a reward adaptation rate for a favorable environment's response and a penalty adaptation rate for an unfavorable environment's response. In this correspondence, we introduce Multiple Response learning automata by explicitly classifying the environment responses into a rew...

متن کامل

Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning

2008

Matthias Rungger Hao Ding Olaf Stursberg

In order to establish autonomous behavior for technical systems, the well known trade-off between reactive control and deliberative planning has to be considered. Within this paper, we combine both principles by proposing a two-level hierarchical reinforcement learning scheme to enable the system to autonomously determine suitable solutions to new tasks. The approach is based on a behavior repr...

متن کامل

DARLA: Improving Zero-Shot Transfer in Reinforcement Learning

2017

S in Fig 2. The setup was a seek-avoid style task, where one of the two object types in the room gave a reward of +1 and the other gave a reward of -1. The agent was allowed to pick up objects for 60 seconds after which the episode would terminate and a new one would begin; if the agent was able to pick up all the ‘good’ objects in less than 60 seconds, a new episode was begun immediately. The ...

متن کامل

Multi-robot concurrent learning of cooperative behaviours for the tracking of multiple moving targets

2007

Zheng Liu Marcelo H. Ang Winston Khoon Guan Seah

Reinforcement learning has been extensively studied and applied for generating cooperative behaviours in multi-robot systems. However, traditional reinforcement learning algorithms assume discrete state and action spaces with finite number of elements. This limits the learning to discrete behaviours and cannot be applied to most real multi-robot systems that inherently require appropriate combi...

متن کامل

Girsanov Based Direct Policy Gradient Methods

2012

Evangelos A. Theodorou Emo Todorov

Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcement learning in continuous state action spaces and continuous time. The derivation is based on successive application of Girsanov’s theor...

متن کامل

designing a procurement mechanism based on q-learning with an action-selection policy based on pso algorithm

Journal: :مدیریت زنجیره تأمین 0

زهره کاهه رضا برادران کاظم زاده

in this paper, tender problems in an automobile company for procuring needed items from potential suppliers have been resolved by the learning algorithm q. in this case the purchaser with respect to proposals received from potential providers, including price and delivery time is proposed; order the needed parts to suppliers assigns. the buyer’s objective is minimizing the procurement costs thr...

متن کامل

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Journal: :CoRR 2017

Will Grathwohl Dami Choi Yuhuai Wu Geoffrey Roeder David K. Duvenaud

Gradient-based optimization is the foundation of deep learning and reinforcement learning, but is difficult to apply when the mechanism being optimized is unknown or not differentiable. We introduce a general framework for learning low-variance, unbiased gradient estimators, applicable to black-box functions of discrete or continuous random variables. Our method uses gradients of a surrogate ne...

متن کامل

A Learning Automata based Solution for Optimizing Dialogue Strategy in Spoken Dialogue System

2012

G. Kumaravelan R. Sivakumar

Application of reinforcement learning methods in the development of dialogue strategies that support robust and efficient human–computer interaction using spoken language is a growing research area. In spoken dialogue system, Markov Decision Processes (MDPs) provide a formal framework for making dialogue management decisions for planning. This framework enables the system to learn the value of ...

متن کامل

On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning

2016

Matthew Hausknecht

Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability comp...

متن کامل

An AsmL model for an Intelligent Vehicle Control System

2007

FLORIN STOICA

An abstract state machine (ASM) is a mathematical model of the system’s evolving, runtime state. ASMs can be used to faithfully capture the abstract structure and step-wise behaviour of any discrete systems. An easy way to understand ASMs is to see them as defining a succession of states that may follow an initial state. We present a machine-executable model for an Intelligent Vehicle Control S...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید