A prominent approach to visual Reinforcement Learning (RL) is learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal inductive biases. However, while real world inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as aux...