Advances in Spectral Parameterization for Statistical (HMM-Based) TTS
نویسندگان
چکیده
HMM-based parametric speech synthesis has recently become an alternative to the concatenative TTS approach, especially when low footprint and general speech domain are required. A majority of speech parameterization models used in state-ofthe art HMM TTS systems employ source-filter waveform synthesis schemes. Sinusoidal representation and waveform generation of speech is an alternative to the source-filter model, which is successfully applied in speech coding, unitselection TTS and voice conversion, but rarely used for HMM TTS systems. In this paper we utilize Regularized Cepstral Coefficients (RCC) estimated in mel-frequency scale for sinusoidal amplitude spectrum envelope modeling within an HMM-based TTS framework. Improved subjective quality for mel-frequency RCC (MRCC) combined with the sinusoidal model based reconstruction is reported and compared to the state-of-the-art MGC-LSP parameters.
منابع مشابه
Sinusoidal model parameterization for HMM-based TTS system
A sinusoidal representation of speech is an alternative to the source-filter model. It is widely used in speech coding and unit-selection TTS, but is less common in statistical TTS frameworks. In this work we utilize Regularized Cepstral Coefficients (RCC) estimated in mel-frequency scale for amplitude spectrum envelope modeling within an HMM-based TTS platform. Improved subjective quality for ...
متن کاملSinusoidal model parameterization for HMM-based TTS system-Interspeech2010_v2.1.1
A sinusoidal representation of speech is an alternative to the source-filter model. It is widely used in speech coding and unit-selection TTS, but is less common in statistical TTS frameworks. In this work we utilize Regularized Cepstral Coefficients (RCC) estimated in mel-frequency scale for amplitude spectrum envelope modeling within an HMM-based TTS platform. Improved subjective quality for ...
متن کاملF0 parameterization of glottalized tones for HMM-based vietnamese TTS
A conventional HMM-based TTS system for Hanoi Vietnamese often suffers from the hoarse quality due to the incomplete F0 parameterization of glottalized tones. As estimating F0 in glottalization is rather problematic for usual F0 extractors, we propose a pitch marking algorithm where the pitch marks are propagated from regular regions of speech signal to glottalized one, from which the complete ...
متن کاملA hybrid TTS between unit selection and HMM-based TTS under limited data conditions
The intelligibility of HMM-based TTS can reach that of the original speech. However, HMM-based TTS is far from natural. On the contrary, unit selection TTS is the most-natural sounding TTS currently. However, its intelligibility and naturalness on segmental duration and timing are not stable. Additionally, unit selection needs to store a huge amount of data for concatenation. Recently, hybrid a...
متن کاملA novel irregular voice model for HMM-based speech synthesis
State-of-the-art text-to-speech (TTS) synthesis is often based on statistical parametric methods. Particular attention is paid to hidden Markov model (HMM) based text-to-speech synthesis. HMM-TTS is optimized for ideal voices and may not produce high quality synthesized speech with voices having frequent non-ideal phonation. Such a voice quality is irregular phonation (also called as glottaliza...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011