The Burer-Monteiro [1] decomposition (X = Y Y T ) with stochastic gradient descent is commonly employed to speed up and scale up matrix problems including matrix completion, subspace tracking, and SDP relaxation. Although it is widely used in practice, there exist no known global convergence results for this method. In this paper, we prove that, under broad sampling conditions, a first-order ra...