Expectation-Maximization for Inverse Reinforcement Learning with Hidden Data
نویسندگان
چکیده
We consider the problem of performing inverse reinforcement learning when the trajectory of the agent being observed is partially occluded from view. Motivated by robotic scenarios in which limited sensor data is available to a learner, we treat the missing information as hidden variables and present an algorithm based on expectationmaximization to solve the non-linear, non-convex problem. Previous work in this area simply removed the occluded portions from consideration when computing feature expectations; in contrast our technique takes expectations over the missing values, enabling learning even in the presence of dynamic occlusion. We evaluate our new algorithm in a simulated reconnaissance scenario in which the visible portion of the state space varies. Finally, we show our approach enables apprenticeship learning by observing a human performing a sorting task in spite of key information missing from observations.
منابع مشابه
Inverse Reinforcement Learning Under Noisy Observations (Extended Abstract)
We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, noisy observations of the trajectory are available. We generalize the previous method of expectation-maximization for inverse reinforcement learning, which allows the trajectory of the expert to be partially hidden from the learner, to incorpo...
متن کاملExpectation Maximization for Weakly Labeled Data
We call data weakly labeled if it has no exact label but rather a numerical indication of correctness of the label “guessed” by the learning algorithm a situation commonly encountered in problems of reinforcement learning. The term emphasizes similarities of our approach to the known techniques of solving unsupervised and transductive problems. In this paper we present an on-line algorithm that...
متن کاملInverse Reinforcement Learning Under Noisy Observations
We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, a noisy continuoustime observation of the trajectory is provided to the learner. This problem exhibits wide-ranging applications and the specific application we consider here is the scenario in which the learner seeks to penetrate a perimeter ...
متن کاملScaling Expectation-Maximization for Inverse Reinforcement Learning to Multiple Robots under Occlusion
We consider inverse reinforcement learning (IRL) when portions of the expert’s trajectory are occluded from the learner. For example, two experts performing tasks in close proximity may block each other from the learner’s view or the learner is a robot observing mobile robots from a fixed position with limited sensor range. Previous methods mitigate this challenge by either focusing on the obse...
متن کاملInverse Reinforcement Learning with Locally Consistent Reward Functions
Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated trajectory to be produced by only a single reward function. This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts’ behaviors. Solving our generali...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016