In this article, we study the design of controllers in context stochastic optimal control under assumption that model system is not available. This is, aim to a Markov decision process which do know transition probabilities, but have access sample trajectories through experience. We define safety as agent remaining desired safe set with high probability during operation time. The drawbacks form...