The combination of model predictive control (MPC) and learning methods has been gaining increasing attention as a tool to systems that may be difficult model. Using MPC function approximator in reinforcement (RL) is one approach reduce the reliance on accurate models. RL dependent exploration learn, currently, simple heuristics based random perturbations are most common. This paper considers va...