This research reports on the recent development of black-box optimization methods based single-step deep reinforcement learning and their conceptual similarity to evolution strategy (ES) techniques. It formally introduces policy-based (PBO), a policy-gradient-based algorithm that relies policy network describe density function its forthcoming evaluations, uses covariance estimation steer improv...