The real-world objects in our physical environment present with diverse information and multimodal features, including 3D shapes (geometry topology) 2D images (appearance semantics), etc. How to effectively represent correlate them a unified way is still very challenging due different modalities representations. In this paper, we novel method learn effective latent space for joint representatio...