Towards speaking style transplantation in speech synthesis

نویسندگان

Jaime Lorenzo-Trueba

Roberto Barra-Chicote

Junichi Yamagishi

Oliver Watts

Juan Manuel Montero-Martínez

چکیده

One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation

One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identifi...

متن کامل

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

In this work, we propose “global style tokens” (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-toend speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large range of acoustic expressiveness. GSTs lead to a rich set of significant results. The soft interpretable “labels” they generate can be used to con...

متن کامل

Hmm-based Expressive Speech Synthesis —towards Tts with Arbitrary Speaking Styles and Emotions

This paper describes recent progress in our approach to generating expressive speech. A goal of text-to-speech (TTS) synthesis is to have an ability to generate natural sounding speech with arbitrary speaker’s voice characteristics, speaking styles and emotional expressions. To change voice and speaking style and/or emotion of the synthetic speech arbitrarily with maintaining its naturalness, i...

متن کامل

Discrete/Continuous Modelling of Speaking Style in HMM-Based Speech Synthesis: Design and Evaluation

This paper assesses the ability of a HMM-based speech synthesis systems to model the speech characteristics of various speaking styles. A discrete/continuous HMM is presented to model the symbolic and acoustic speech characteristics of a speaking style. The proposed model is used to model the average characteristics of a speaking style that is shared among various speakers, depending on specifi...

متن کامل

Recent Development of HMM-Based Expressive Speech Synthesis and Its Applications

This paper describes the recent development of HMM-based expressive speech synthesis. Although the expressive speech includes a wide variety of expressions such as emotions, speaking styles, intention, attitude, emphasis, focus, and so on, we mainly refer to the speech synthesis techniques for emotions and speaking styles, which would be the most primary expressions in human speech communicatio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Towards speaking style transplantation in speech synthesis

نویسندگان

چکیده

منابع مشابه

Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Hmm-based Expressive Speech Synthesis —towards Tts with Arbitrary Speaking Styles and Emotions

Discrete/Continuous Modelling of Speaking Style in HMM-Based Speech Synthesis: Design and Evaluation

Recent Development of HMM-Based Expressive Speech Synthesis and Its Applications

عنوان ژورنال:

اشتراک گذاری