Indoor action recognition plays an important role in modern society, such as intelligent healthcare large mobile cabin hospitals. With the wide usage of depth sensors like Kinect, multimodal information including skeleton and RGB modalities brings a promising way to improve performance. However, existing methods are either focusing on single data modality or failed take advantage multiple modal...