Video moment retrieval targets at retrieving a golden in video for given natural language query. The main challenges of this task include 1) the requirement accurately localizing (i.e., start time and end of) relevant an untrimmed stream, 2) bridging semantic gap between textual query contents. To tackle those problems, early approaches adopt sliding window or uniform sampling to collect clips ...