Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability convert prosody. Nonetheless, without sufficient data, seq2seq VC can suffer from unstable training and mispronunciation problems in the converted speech, thus far practical. To tackle these shortcomings, we propose transfer knowledge other speech processing tasks where large-scale corpora easily a...