Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization

نویسندگان

چکیده

Deep reinforcement learning with domain randomization learns a control policy in various simulations randomized physical and sensor model parameters to become transferable the real world zero-shot setting. However, huge number of samples are often required learn an effective when range is extensive due instability updates. To alleviate this problem, we propose sample-efficient method named cyclic distillation (CPD). CPD divides into several small sub-domains assigns local each one. Then policies learned while cyclically transitioning sub-domains. accelerates through knowledge transfer based on expected performance improvements. Finally, all distilled global for sim-to-real transfers. CPD’s effectiveness sample efficiency demonstrated four tasks (Pendulum from OpenAIGym Pusher, Swimmer, HalfCheetah Mujoco), real-robot, ball-dispersal task. We published code videos our experiments at https://github.com/yuki-kadokawa/cyclic-policy-distillation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sample Efficient Reinforcement Learning with Gaussian Processes

This paper derives sample complexity results for using Gaussian Processes (GPs) in both modelbased and model-free reinforcement learning (RL). We show that GPs are KWIK learnable, proving for the first time that a model-based RL approach using GPs, GP-Rmax, is sample efficient (PAC-MDP). However, we then show that previous approaches to model-free RL using GPs take an exponential number of step...

متن کامل

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterpa...

متن کامل

Flexible Robotic Grasping with Sim-to-Real Transfer based Reinforcement Learning

Robotic manipulation requires a highly flexible and compliant system. Task-specific heuristics are usually not able to cope with the diversity of the world outside of specific assembly lines and cannot generalize well. Reinforcement learning methods provide a way to cope with uncertainty and allow robots to explore their action space to solve specific tasks. However, this comes at a cost of hig...

متن کامل

Application of Reinforcement Learning to Batch Distillation

An important amount of work exists on the topic of optimal operation and control of batch distillation though it is still based on the assumption of an accurate process model being available. While this assumption is valid from a theoretical point of view, there will always remain the challenge of practical applications. Reinforcement Learning (RL) has been recognised already as a particularly ...

متن کامل

Safe and Efficient Off-Policy Reinforcement Learning

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(λ), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of “off-policyness”; and (3) it is efficient as it makes the b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Robotics and Autonomous Systems

سال: 2023

ISSN: ['0921-8890', '1872-793X']

DOI: https://doi.org/10.1016/j.robot.2023.104425