We propose a learning-based method of estimating the compatibility between vocal and accompaniment audio tracks, i.e. , how well they go with each other when played simultaneously. This task is challenging because it difficult to formulate hand-crafted rules or construct large labeled dataset p...