In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, observe trajectories sampled by an expert that acts according some policy. The goal is find policy matches expert's performance on predefined set of functions. We introduce online variant AL (Online Learning; OAL), where agent expected perform comparably while interacting ...