Implementing an SSML compliant concatenative TTS system
نویسندگان
چکیده
The W3C Speech Synthesis Markup Language (SSML) unifies a number of recent related markup languages that have emerged to fill the perceived need for increased, and standardized, user control over Text to Speech (TTS) engines. One of the main drivers for markup has been the increasing use of TTS engines as embedded components of specific applications – which means they are in a position to take advantage of additional knowledge about the text. Although SSML allows improved control over the text normalization process, most of the attention has focused on the level of prosody markup, especially since the prediction of the prosody is generally acknowledged as one of the most significant problems in TTS synthesis. Prosody control is by no means simple due to the large cross-dependency between other related aspects of prosody. Prosody control is also of particular complexity for concatenative TTS systems. SSML is about much more than prosody control though allowing high level engine control such as language switching and voice switching, and low level control such as phonetic input for words. Our experiences in implementing these diverse requirements of the SSML standard are discussed.
منابع مشابه
SSML Extensions Aimed To Improve Asian Language TTS Rendering
Both formant synthesis based and concatenative acoustic unit based TTS systems have been developled in Nokia. Many non-English languages have been considered in the development work, and Nokia's Mandarin Chinese TTS system is under continuous development within the TC-STAR framework (www.tc-star.org). To meet the needs of the TTS evaluations in TC-STAR, common interfaces for the input and all t...
متن کاملSyllable HMM based Mandarin TTS and comparison with concatenative TTS
This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results s...
متن کاملIRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES A Hybrid Text-to-Speech System that Combines Concatenative and Statistical Synthesis Units
Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in t...
متن کاملMaximum-likelihood dynamic intonation model for concatenative text-to-speech system
In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...
متن کاملSSML Goes International – A Standard Story
Since September 2004, the SSML 1.0 [1] specification has been a W3C Recommendation. SSML is the standard way that a Voice Browser controls speech synthesis engine. Given that it is a standard, actions to define the language of the text to be rendered, to change between several voices, to insert pauses, to perform simple text normalization (e.g. acronym expansions, such as reading W3C as “World ...
متن کامل