Value function is the central notion of Reinforcement Learning (RL). estimation, especially with approximation, can be challenging since it involves stochasticity environmental dynamics and reward signals that sparse delayed in some cases. A typical model-free RL algorithm usually estimates values a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without...