Temporal relational modeling in video is essential for human action understanding, such as recognition and segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages relation reasoning on many tasks, it still a challenge to apply graph convolution networks long sequences effectively. The main reason that large number of nodes (i.e., frames) makes GCNs hard capture ...