The most popular framework for distributed training of machine learning models is the (synchronous) parameter server (PS). This paradigm consists n workers, which iteratively compute updates model parameters, and a stateful PS, waits aggregates all to generate new estimate parameters sends it back workers iteration. Transient computation slowdowns or transmission delays can intolerably lengthen...