نتایج جستجو برای: discrete action reinforcement learning automata darla

تعداد نتایج: 1357117  

Journal: :IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society 1996
Anastasios A. Economides

Learning Automata update their action probabilites on the basis of the response they get from a random environment. They use a reward adaptation rate for a favorable environment's response and a penalty adaptation rate for an unfavorable environment's response. In this correspondence, we introduce Multiple Response learning automata by explicitly classifying the environment responses into a rew...

2008
Matthias Rungger Hao Ding Olaf Stursberg

In order to establish autonomous behavior for technical systems, the well known trade-off between reactive control and deliberative planning has to be considered. Within this paper, we combine both principles by proposing a two-level hierarchical reinforcement learning scheme to enable the system to autonomously determine suitable solutions to new tasks. The approach is based on a behavior repr...

2017

S in Fig 2. The setup was a seek-avoid style task, where one of the two object types in the room gave a reward of +1 and the other gave a reward of -1. The agent was allowed to pick up objects for 60 seconds after which the episode would terminate and a new one would begin; if the agent was able to pick up all the ‘good’ objects in less than 60 seconds, a new episode was begun immediately. The ...

2007
Zheng Liu Marcelo H. Ang Winston Khoon Guan Seah

Reinforcement learning has been extensively studied and applied for generating cooperative behaviours in multi-robot systems. However, traditional reinforcement learning algorithms assume discrete state and action spaces with finite number of elements. This limits the learning to discrete behaviours and cannot be applied to most real multi-robot systems that inherently require appropriate combi...

2012
Evangelos A. Theodorou Emo Todorov

Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcement learning in continuous state action spaces and continuous time. The derivation is based on successive application of Girsanov’s theor...

Journal: :مدیریت زنجیره تأمین 0
زهره کاهه رضا برادران کاظم زاده

in this paper, tender problems in an automobile company for procuring needed items from potential suppliers have been resolved by the learning algorithm q. in this case the purchaser with respect to proposals received from potential providers, including price and delivery time is proposed; order the needed parts to suppliers assigns. the buyer’s objective is minimizing the procurement costs thr...

Journal: :CoRR 2017
Will Grathwohl Dami Choi Yuhuai Wu Geoffrey Roeder David K. Duvenaud

Gradient-based optimization is the foundation of deep learning and reinforcement learning, but is difficult to apply when the mechanism being optimized is unknown or not differentiable. We introduce a general framework for learning low-variance, unbiased gradient estimators, applicable to black-box functions of discrete or continuous random variables. Our method uses gradients of a surrogate ne...

2012
G. Kumaravelan R. Sivakumar

Application of reinforcement learning methods in the development of dialogue strategies that support robust and efficient human–computer interaction using spoken language is a growing research area. In spoken dialogue system, Markov Decision Processes (MDPs) provide a formal framework for making dialogue management decisions for planning. This framework enables the system to learn the value of ...

2016
Matthew Hausknecht

Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability comp...

2007
FLORIN STOICA

An abstract state machine (ASM) is a mathematical model of the system’s evolving, runtime state. ASMs can be used to faithfully capture the abstract structure and step-wise behaviour of any discrete systems. An easy way to understand ASMs is to see them as defining a succession of states that may follow an initial state. We present a machine-executable model for an Intelligent Vehicle Control S...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید