Improving sequence-to-sequence Tibetan speech synthesis with prosodic information
نویسندگان
چکیده
There are about 6,000 languages worldwide, most of which low-resource languages. Although the current speech synthesis (or text-to-speech, TTS) for major (e.g., Mandarin, English, French) has achieved good results, voice quality TTS Tibetan) still needs to be further improved. Because prosody plays a significant role in natural speech, article proposes two sequence-to-sequence (seq2seq) Tibetan models with prosodic information fusion improve synthesized speech. We first constructed large-scale corpus seq2seq TTS. Then we designed generator extract from sentences. Finally, trained by fusing information, including feature-level and model-level fusion. The experimental results showed that proposed models, fuse could effectively Furthermore, only 60% ~ 70% training data synthesize similar baseline Therefore, methods can
منابع مشابه
Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملProsodic Modeling in Text-to-Speech Synthesis
This paper discusses three broad obstacles that must be overcome to improve prosodic quality in text-to-speech systems. First, direct and indirect limits set by the signal processing (“synthesis”) components. Second, combinatorial and statistical constraints inherent in generalizing from training corpora to unrestricted domains, and that require the integration of contentspecific knowledge and ...
متن کاملProsodic modelling in text-to-speech synthesis
This paper discusses three broad obstacles that must be overcome to improve prosodic quality in text-to-speech systems. First, direct and indirect limits set by the signal processing (“synthesis”) components. Second, combinatorial and statistical constraints inherent in generalizing from training corpora to unrestricted domains, and that require the integration of content-specific knowledge and...
متن کاملGeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks
The Fisher information metric is an important foundation of information geometry, wherein it allows us to approximate the local geometry of a probability distribution. Recurrent neural networks such as the Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield state-of-the-art performance on speech translation or image captioning have so far ignored the geometry of the late...
متن کاملImproving Sequence to Sequence Neural Machine Translation by Utilizing Syntactic Dependency Information
Sequence to Sequence Neural Machine Translation has achieved significant performance in recent years. Yet, there are some existing issues that Neural Machine Translation still does not solve completely. Two of them are translation of long sentences and “over-translation”. To address these two problems, we propose an approach that utilize more grammatical information such as syntactic dependenci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing
سال: 2023
ISSN: ['2375-4699', '2375-4702']
DOI: https://doi.org/10.1145/3616012