In this paper, we consider the linear programming (LP) formulation for deep reinforcement learning. The number of constraints depends on size state and action spaces, which makes problem intractable in large or continuous environments. general augmented Lagrangian method suffers double-sampling obstacle solving program. Motivated from updates multipliers, overcome obstacles minimizing function ...