INTEGRATING DECISION-THEORETIC PLANNING AND PROGRAMMING FOR ROBOT CONTROL IN HIGHLY DYNAMIC DOMAINS von
نویسنده
چکیده
MDPs Yet, the application of options/macros has only been discussed by intuition. One of the models of usage proposed in [21] is the following: Definition 3.2.3 Let Π = {S1, . . . , Sn} be a decomposition of MDP M = 〈A,S,Tr ,R〉, and let A = {Ai : i ≤ n} be a collection of macroaction sets, where Ai = {π1 i , . . . , πi i } is a set of macros for region Si. The abstract MDP M ′ = 〈A′, S′,Tr ′,R′〉 induced by Π and A, is given by: • S′ = PerΠ(S) = ⋃ i≤nEPer(Si) • A′ = iAi with π i ∈ Ai feasible only at states s ∈ EPer(Si) • T ′(s, π i , s′) is given by the discounted transition model for π i , for any s ∈ EPer(Si) and s′ ∈ XPer(Si); T ′(s, π i , s′) = 0 for any s′ 6∈ XPer(Si) • R′(s, π i ) is given by the discounted reward model for π i , for any s ∈ EPer(Si). This seizes our intuition about the general idea of options: We abstract from the original state space to a much smaller one, namely the set of peripheral states. These build some kind of interfaces between the regions. The actions used for planning are the options (macros) defined for the different regions. The models, transition model and reward function, are the discounted models that have been presented. Recall that these form the crucial part in planning at the level of options. We point out that this reduction of complexity, which will finally speed up computation as we will see, comes at the cost of possibly finding only a sub-optimal solution. The examples in [21] have been conducted in a grid world which is depicted in Figure 3.6. This navigation task, where negative rewards are to be minimized, was solved 7Alternatively, one can define the local MDP without the additional state α and instead make all exit states absorbing with an empty feasible set. 8[21] also allows for the goal of staying in a region, modeled by low values for all exits. CHAPTER 3. MARKOV DECISION PROCESSES AND OPTIONS 31 (a) Maze121 (b) peripheral states Figure 3.6: Taken from [21]: The example environment for testing the abstract MDP against the original MDP; (a) The agent can move in any compass direction or not move at all. Moving is uncertain: with some small probability the agent may move in another direction than intended. Each cell gives negative reward, except the upper right cell, which is absorbing and thus forms the goal. Shaded squares have a higher negative reward, on patterned squares moving is even more uncertain (probability of failure is higher). Shaded circles denote absorbing states with high negative reward; (b) peripheral states for a decomposition into 11 regions (rooms). with the original MDP as well as with the abstract MDP. Additionally an augmented MDP was tested which we are going to leave out of consideration here. For the abstract MDP the set of macros was created based on the heuristic approach described above – one macro for each region-exit state combination, plus one for staying in the room.9 Value iteration was applied to solve the different MDPs. Figure 3.7 shows the value for one particular state and how it improves over time. Clearly, with the abstract MDP the value function converges much faster. But, recognizable from the limit of the value function, the abstract MDP finds only a suboptimal solution: It finds a policy which takes the agent to the goal with expected costs (negative reward) of over 20, while the original MDP finds a way where less than 20 are expected. Nevertheless, the computational saving seem worth the drawback on solution quality.10
منابع مشابه
Logic-based robot control in highly dynamic domains
In this paper we present the robot programming and planning language Readylog, a Golog dialect which was developed to support the decision making of robots acting in dynamic real-time domains like robotic soccer. The formal framework of Readylog, which is based on the situation calculus, features imperative control structures like loops and procedures, allows for decision-theoretic planning, an...
متن کاملRepairing Decision-Theoretic Policies Using Goal-Oriented Planning
In this paper we address the problem of how decision-theoretic policies can be repaired. This work is motivated by observations made in robotic soccer where decision-theoretic policies become invalid due to small deviations during execution; and repairing might pay off compared to re-planning from scratch. Our policies are generated with Readylog, a derivative of Golog based on the situation ca...
متن کاملPlanning and Control of Two-Link Rigid Flexible Manipulators in Dynamic Object Manipulation Missions
This research focuses on proposing an optimal trajectory planning and control method of two link rigid-flexible manipulators (TLRFM) for Dynamic Object Manipulation (DOM) missions. For the first time, achievement of DOM task using a rotating one flexible link robot was taken into account in [20]. The authors do not aim to contribute on either trajectory tracking or vibration control of the End-...
متن کاملMarkov Decision Processes: Concepts and Algorithms
Situated in between supervised learning and unsupervised learning, the paradigm of reinforcement learning deals with learning in sequential decision making problems in which there is limited feedback. This text introduces the intuitions and concepts behind Markov decision processes and two classes of algorithms for computing optimal behaviors: reinforcement learning and dynamic programming. Fir...
متن کاملIntegrating Planning into Reactive High-Level Robot Programs
IndiGolog is a high-level programming language for robots and intelligent agents that supports on-line planning and plan execution in dynamic and incompletely known environments. Programs may perform sensing actions that acquire information at runtime and react to exogenous actions. In this paper, we show how IndiGolog can be used to write robot control programs that combine planning, sensing, ...
متن کاملCoping with uncertainty in control and planning for a mobile robot
This paper describes a decision theoretic approach to real-time obstacle avoidance and path planning for a mobile robot. The mobile robot navigates in a semistructured environment in which unexpected obstacles may appear at random locations. Twelve sonar sensors are currently used to report the presence and location of the obstacles. To handle the uncertainty of an obstacle’s appearance, we ado...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003