Off-Policy Evaluation With Online Adaptation for Robot Exploration in Challenging Environments

نویسندگان

چکیده

Autonomous exploration has many important applications. However, classic information gain-based or frontier-based only relies on the robot current state to determine immediate goal, which lacks capability of predicting value future states and thus leads inefficient decisions. This letter presents a method learn how “good” are, measured by function, provide guidance for in real-world challenging environments. We formulate our work as an off-policy evaluation (OPE) problem (OPERE). It consists offline Monte-Carlo training data performs Temporal Difference (TD) online adaptation optimize trained estimator. also design intrinsic reward function based sensor coverage enable gain more with sparse extrinsic rewards. Results show that enables predict so better guide exploration. The proposed algorithm achieves prediction performance compared state-of-the-arts. To best knowledge, this first time demonstrates dataset subterranean urban

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Searching for Optimal Off-Line Exploration Paths in Grid Environments for a Robot with Limited Visibility

Robotic exploration is an on-line problem in which autonomous mobile robots incrementally discover and map the physical structure of initially unknown environments. Usually, the performance of exploration strategies used to decide where to go next is not compared against the optimal performance obtainable in the test environments, because the latter is generally unknown. In this paper, we prese...

متن کامل

Eligibility Traces for Off-Policy Policy Evaluation

Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference methods. Here we generalize eligibility traces to off-policy learning, in which one learns about a policy different from the policy that generates the data. Off-policy methods can greatly multiply learning, as many policie...

متن کامل

Incremental Online Evolution and Adaptation of Neural Networks for Robot Control in Dynamic Environments

Many approaches have been developed to tackle the design complexity of modern robotic systems by using evolutionary processes. Starting with an initial solution, the evolutionary process tries to adapt to a given scenario and in the end produces an improved solution. Previous work showed that incremental evolution, a stepwise increase in the scenario difficulty, can increase the success of evol...

متن کامل

Online policy adaptation for ensemble classifiers

Ensemble algorithms can improve the performance of a given learning algorithm through the combination of multiple base classifiers into an ensemble. In this paper, the idea of using an adaptive policy for training and combining the base classifiers is put forward. The effectiveness of this approach for online learning is demonstrated by experimental results on several UCI benchmark databases.

متن کامل

Toward Reliable Off Road Autonomous Vehicles Operating in Challenging Environments

The DARPA PerceptOR program implements a rigorous evaluative test program which fosters the development of field relevant outdoor mobile robots. Autonomous ground vehicles are deployed on diverse test courses throughout the USA and quantitatively evaluated on such factors as autonomy level, waypoint acquisition, failure rate, speed, and communications bandwidth. Our efforts over the three year ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE robotics and automation letters

سال: 2023

ISSN: ['2377-3766']

DOI: https://doi.org/10.1109/lra.2023.3271520