Improving sequence-to-sequence Tibetan speech synthesis with prosodic information

نویسندگان

چکیده

There are about 6,000 languages worldwide, most of which low-resource languages. Although the current speech synthesis (or text-to-speech, TTS) for major (e.g., Mandarin, English, French) has achieved good results, voice quality TTS Tibetan) still needs to be further improved. Because prosody plays a significant role in natural speech, article proposes two sequence-to-sequence (seq2seq) Tibetan models with prosodic information fusion improve synthesized speech. We first constructed large-scale corpus seq2seq TTS. Then we designed generator extract from sentences. Finally, trained by fusing information, including feature-level and model-level fusion. The experimental results showed that proposed models, fuse could effectively Furthermore, only 60% ~ 70% training data synthesize similar baseline Therefore, methods can

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Prosodic Modeling in Text-to-Speech Synthesis

This paper discusses three broad obstacles that must be overcome to improve prosodic quality in text-to-speech systems. First, direct and indirect limits set by the signal processing (“synthesis”) components. Second, combinatorial and statistical constraints inherent in generalizing from training corpora to unrestricted domains, and that require the integration of contentspecific knowledge and ...

متن کامل

Prosodic modelling in text-to-speech synthesis

This paper discusses three broad obstacles that must be overcome to improve prosodic quality in text-to-speech systems. First, direct and indirect limits set by the signal processing (“synthesis”) components. Second, combinatorial and statistical constraints inherent in generalizing from training corpora to unrestricted domains, and that require the integration of content-specific knowledge and...

متن کامل

GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks

The Fisher information metric is an important foundation of information geometry, wherein it allows us to approximate the local geometry of a probability distribution. Recurrent neural networks such as the Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield state-of-the-art performance on speech translation or image captioning have so far ignored the geometry of the late...

متن کامل

Improving Sequence to Sequence Neural Machine Translation by Utilizing Syntactic Dependency Information

Sequence to Sequence Neural Machine Translation has achieved significant performance in recent years. Yet, there are some existing issues that Neural Machine Translation still does not solve completely. Two of them are translation of long sentences and “over-translation”. To address these two problems, we propose an approach that utilize more grammatical information such as syntactic dependenci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3616012