Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation

نویسندگان

  • Rémi Munos
  • Leemon C. Baird
  • Andrew W. Moore
چکیده

In this paper we investigate new approaches to dynamic-programming-based optimal control of continuous time-and-space systems. We use neural networks to approximate the solution to the Hamilton-Jacobi-Bellman (HJB) equation which is, in the deterministic case studied here, a rst-order, non-linear, partial di erential equation. We derive the gradient descent rule for integrating this equation inside the domain, given the conditions on the boundary. We apply this approach to the \Caron-the-hill" which is a two-dimensional highly non-linear control problem. We discuss the results obtained and point out a low quality of approximation of the value function and of the derived control. We attribute this bad approximation to the fact that the HJB equation has many generalized solutions (i.e. di erentiable almost everywhere) other than the value function, and our gradient descent method converges to one among these functions, thus possibly failing to nd the correct value function. We illustrate this limitation on a simple onedimensional control problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gradient Descent Approaches to Neural - Net - BasedSolutions of the Hamilton - Jacobi -

In this paper we investigate new approaches to dynamic-programming-based optimal control of continuous time-and-space systems. We use neural networks to approximate the solution to the Hamilton-Jacobi-Bellman (HJB) equation which is, in the deterministic case studied here, a rst-order, non-linear, partial diierential equation. We derive the gradient descent rule for integrating this equation in...

متن کامل

Extended Applicability of the Symplectic Pontryagin Method

Abstract. The Symplectic Pontryagin method was introduced in a previous paper. This work shows that this method is applicable under less restrictive assumptions. Existence of solutions to the Symplectic Pontryagin scheme are shown to exist without the previous assumption on a bounded gradient of the discrete dual variable. The convergence proof uses the representation of solutions to a Hamilton...

متن کامل

A Study of Reinforcement Learningin the Continuous Case by the Means

This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectationof the best future cumulativereinforcement. In the continuous case, the value function satisses a non-linear rst (or sec...

متن کامل

Nonlinear Optimal Control Techniques Applied to a Launch Vehicle Autopilot

This paper presents an application of the nonlinear optimal control techniques to the design of launch vehicle autopilots. The optimal control is given by the solution to the Hamilton-Jacobi-Bellman (HJB) equation, which in this case cannot be solved explicity. A method based upon Successive Galerkin Approximation (SGA), is used to obtain an approximate optimal solution. Simulation results invo...

متن کامل

Using Neural Networks for Fast Reachable Set Computations

To sidestep the curse of dimensionality when computing solutions to Hamilton-Jacobi-Bellman partial differential equations (HJB PDE), we propose an algorithm that leverages a neural network to approximate the value function. We show that our final approximation of the value function generates near optimal controls which are guaranteed to successfully drive the system to a target state. Our fram...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999