Direct and indirect reinforcement learning

نویسندگان

چکیده

Reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision-making and control tasks. In this paper, we classify RL into direct indirect according how they seek the optimal policy Markov decision process problem. The former solves by directly maximizing an objective function using gradient descent methods, in which is usually expectation accumulative future rewards. latter indirectly finds solving Bellman equation, sufficient necessary condition from Bellman's principle optimality. We study (PG) forms show that both them can derive actor–critic architecture be unified PG with approximate value stationary state distribution, revealing equivalence RL. employ Gridworld task verify influence different PG, suggesting their differences relationships experimentally. Finally, current mainstream taxonomy, together other ones, including value-based policy-based, model-based model-free.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Response acquisition under direct and indirect contingencies of reinforcement.

We compared the effects of direct and indirect reinforcement contingencies on the performance of 6 individuals with profound developmental disabilities. Under both contingencies, completion of identical tasks (opening one of several types of containers) produced access to identical reinforcers. Under the direct contingency, the reinforcer was placed inside the container to be opened; under the ...

متن کامل

Direct gradient-based reinforcement learning

Models agent interacting with its environment.

متن کامل

Attention and Reinforcement Learning: Constructing Representations from Indirect Feedback

Reinforcement learning (RL) shows great promise as a theory of learning in complex, dynamic tasks. However, the learning performance of RL models depends strongly on how stimuli are represented, because this determines how knowledge is generalized among stimuli. We propose a mechanism by which RL autonomously constructs representations that suit its needs, using selective attention among stimul...

متن کامل

Fast and Stable Learning in Direct-Vision-Based Reinforcement Learning

Direct-Vision-Based Reinforcement Learning has been proposed not only for the motion planning but for the learning of the whole process from sensors to motors in robots, including recognition, attention and so on. In this learning, raw visual sensory signals are put into a layered neural network directly, and the network is trained by the training signals generated based on reinforcement learni...

متن کامل

Direct Uncertainty Estimation in Reinforcement Learning

Optimal probabilistic approach in reinforcement learning is computationally infeasible. Its simplification consisting in neglecting difference between true environment and its model estimated using limited number of observations causes exploration vs exploitation problem. Uncertainty can be expressed in terms of a probability distribution over the space of environment models, and this uncertain...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Intelligent Systems

سال: 2021

ISSN: ['1098-111X', '0884-8173']

DOI: https://doi.org/10.1002/int.22466