Abstract Automatic speech recognition (ASR) is a crucial technology for man-machine interaction. End-to-end models have been studied recently in deep learning ASR. However, these are not suitable the practical application of ASR due to their large model sizes and computation costs. To address this issue, we propose novel mutual-learning sequence-level knowledge distillation framework enjoying d...