Cs229 Lecture Notes Reinforcement Learning and Control

نویسنده

  • Andrew Ng
چکیده

We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make their outputs mimic the labels y given in the training set. In that setting, the labels gave an unambiguous “right answer” for each of the inputs x. In contrast, for many sequential decision making and control problems, it is very difficult to provide this type of explicit supervision to a learning algorithm. For example, if we have just built a four-legged robot and are trying to program it to walk, then initially we have no idea what the “correct” actions to take are to make it walk, and so do not know how to provide explicit supervision for a learning algorithm to try to mimic. In the reinforcement learning framework, we will instead provide our algorithms only a reward function, which indicates to the learning agent when it is doing well, and when it is doing poorly. In the four-legged walking example, the reward function might give the robot positive rewards for moving forwards, and negative rewards for either moving backwards or falling over. It will then be the learning algorithm’s job to figure out how to choose actions over time so as to obtain large rewards. Reinforcement learning has been successful in applications as diverse as autonomous helicopter flight, robot legged locomotion, cell-phone network routing, marketing strategy selection, factory control, and efficient web-page indexing. Our study of reinforcement learning will begin with a definition of the Markov decision processes (MDP), which provides the formalism in which RL problems are usually posed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cs229 Supplemental Lecture Notes Hoeffding's Inequality

A basic question in probability, statistics, and machine learning is the following: given a random variable Z with expectation E[Z], how likely is Z to be close to its expectation? And more precisely, how close is it likely to be? With that in mind, these notes give a few tools for computing bounds of the form P(Z ≥ E[Z] + t) and P(Z ≤ E[Z]− t) (1) for t ≥ 0. Our first bound is perhaps the most...

متن کامل

Cs229 Lecture Notes Support Vector Machines

This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best) “off-the-shelf” supervised learning algorithm. To tell the SVM story, we’ll need to first talk about margins and the idea of separating data with a large “gap.” Next, we’ll talk about the optimal margin classifier, which will lead us into a digression on...

متن کامل

Factors affecting students tendency of Univercity students to Lecture Notes

Introduction: Many studies detected factors contributing to the students’ tendency to lecture notes. This study aimed at evaluating the factors affecting students tendency to lecture notes in Hormozgan University of Medical Sciences. Methods: In this descriptive study, 179 students from medicine, nursing & midwifery, health, and Paramedicine schools were selected through stratified random...

متن کامل

CS229 Final Report Deep Q-Learning to Play Mario

In this paper, I study applying applying and adjusting DeepMind’s Atari Deep Q-Learning model to train an automatic agent to play the 1985 Nintendo game Super Mario Bros. The agent learns control policies from raw pixel data using deep reinforcement learning. The model is a convolutional neural network that trained through only raw frames of the game and basic info such as score and motion.

متن کامل

Optimal and Learning Control for Autonomous Robots

Optimal and Learning Control for Autonomous Robots has been taught in the Robotics, Systems and Controls Masters at ETH Zurich with the aim to teach optimal control and reinforcement learning for closed loop control problems from a unified point of view. The starting point is the formulation of of an optimal control problem and deriving the different types of solutions and algorithms from there...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012