Convolutional Neural Networks with 3D kernels (3D CNNs) currently achieve state-of-the-art results in video recognition tasks due to their supremacy extracting spatiotemporal features within frames. There have been many successful CNN architectures surpassing successively. However, nearly all of them are designed operate offline, creating several serious handicaps during online operation. First...