In this paper, we consider the stochastic iterative counterpart of value iteration scheme wherein only noisy and possibly biased approximations Bellman operator are available. We call approximate (AVI) scheme. Neural networks often used as function approximators, in order to counter Bellman’s curse dimensionality. they operator. Because neural typically trained using sample data, errors biases ...