Q-learning for Robots

نویسنده

  • Claude F. Touzet
چکیده

Robot learning is a challenging – and somewhat unique – research domain. If a robot behavior is defined as a mapping between situations that occurred in the real world and actions to be accomplished, then the supervised learning of a robot behavior requires a set of representative examples (situation, desired action). In order to be able to gather such learning base, the human operator must have a deep understanding of the robot-world interaction (i.e., a model). But, there are many application domains where such models cannot be obtained, either because detailed knowledge of the robot’s world is unavailable (e.g., spatial or underwater exploration, nuclear or toxic waste management), or because it would be to costly. In this context, the automatic synthesis of a representative learning base is an important issue. It can be sought using reinforcement learning techniques – in particular Q-learning which does not require a model of the robot-world interaction. Compared to supervised learning, Q-learning examples are triplets (situation, action, Q value), where the Q value is the utility of executing the action in the situation. The supervised learning base is obtained by recruiting the triplets with the highest utility. Because it allows the synthesis of behaviors despite the absence of a robot-world interaction model, Q-learning (Watkins 1989) has become the most used learning algorithm for autonomous robotics. Although the convergence theorem does not apply to the robotics domain (due to the limited number of situation-action pairs that can be explored during the life-time of the robot batteries), heuristically adapted Qlearning has proved successful in applications such as obstacle avoidance, wall following, go-to-the-nest, etc. This is mostly due to neural-based implementations such as multilayer perceptrons trained with backpropagation, or self-organizing maps. Such implementations provide an efficient generalization, i.e., fast learning, and designate the critic – the reinforcement function definition – as the real issue. The articles REINFORCEMENT LEARNING and REINFORCEMENT LEARNING IN MOTOR CONTROL provide background information on reinforcement learning. Kaelbling (1996) and Sutton (1998) are two other sources of information. For more detailed treatments, the reader should consult Touzet (1997).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using BELBIC based optimal controller for omni-directional threewheel robots model identified by LOLIMOT

In this paper, an intelligent controller is applied to control omni-directional robots motion. First, the dynamics of the three wheel robots, as a nonlinear plant with considerable uncertainties, is identified using an efficient algorithm of training, named LoLiMoT. Then, an intelligent controller based on brain emotional learning algorithm is applied to the identified model. This emotional l...

متن کامل

Online Evolution for Cooperative Behavior in Group Robot Systems

In distributed mobile robot systems, autonomous robots accomplish complicated tasks through intelligent cooperation with each other. This paper presents behavior learning and online distributed evolution for cooperative behavior of a group of autonomous robots. Learning and evolution capabilities are essential for a group of autonomous robots to adapt to unstructured environments. Behavior lear...

متن کامل

A Study on Multi-Dimensional Fuzzy Q-learning for Intelligent Robots

Reinforcement learning is one of the most important learning methods for intelligent robots working in unknown/uncertain environments. Multi-dimensional fuzzy Q-learning, an extension of the Q-learning method, has been proposed in this study. The proposed method has been applied for an intelligent robot working in a dynamic environment. The rewards from the evaluation functions and the fuzzy Q-...

متن کامل

Compact Q-learning optimized for micro-robots with processing and memory constraints

Scaling down robots to miniature size introduces many new challenges including memory and program size limitations, low processor performance and low power autonomy. In this paper we describe the concept and implementation of learning of a safe-wandering task with the autonomous micro-robots, Alice. We propose a simplified reinforcement learning algorithm based on one-step Q-learning that is op...

متن کامل

Co-Operative Strategy for an Interactive Robot Soccer System by Reinforcement Learning Method

This paper presents a cooperation strategy between a human operator and autonomous robots for an interactive robot soccer game. The interactive robot soccer game has been developed to allow humans to join into the game dynamically and reinforce entertainment characteristics. In order to make these games more interesting, a cooperation strategy between humans and autonomous robots on a team is v...

متن کامل

Hexagon-Based Q-Learning Algorithm and Applications

This paper presents a hexagon-based Q-leaning algorithm to find a hidden target object with multiple robots. An experimental environment was designed with five small mobile robots, obstacles, and a target object. Robots went in search of a target object while navigating in a hallway where obstacles were strategically placed. This experiment employed two control algorithms: an area-based action ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001