We aim to address the problem of Natural Language Video Localization (NLVL) — localizing video segment corresponding a natural language description in long and untrimmed video. State-of-the-art NLVL methods are almost one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines series candidates (e.g., by sliding window), then does classi...