The NII speech synthesis entry for Blizzard Challenge 2016

نویسندگان

  • Lauri Juvela
  • Xin Wang
  • Shinji Takaki
  • SangJin Kim
  • Manu Airaksinen
  • Junichi Yamagishi
چکیده

This paper decribes the NII speech synthesis entry for Blizzard Challenge 2016, where the task was to build a voice from audiobook data. The synthesis system is built using the NII parametric speech synthesis framework that utilizes Long Short Term Memory (LSTM) Recurrent Neural Network (RNN) for acoustic modeling. For this entry, we first built a voice using a large data set, and then used the audiobook data to adapt the acoustic model to the target speaker. Additionally, the recent fullband glottal vocoder GlottDNN was used in the system with a DNN-based excitation model for generating glottal waveforms. The vocoder estimates the vocal tract in a band-wise manner using Quasi Closed Phase (QCP) inversefiltering at the low-band. At synthesis stage, the excitation model is used to generate voiced excitation from acoustic features, after which a vocal tract filter is applied to generate synthetic speech. The Blizzard Challenge listening test results show that the proposed system achieves comparable quality with the benchmark parametric synthesis systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The CSTR entry to the Blizzard Challenge 2016

This paper describes the text-to-speech system entered by The Centre for Speech Technology Research into the 2016 Blizzard Challenge. This system is a hybrid synthesis system which uses output from a recurrent neural network to drive a unit selection synthesiser. The annual Blizzard Challenge conducts side-byside testing of a number of speech synthesis systems trained on a common set of speech ...

متن کامل

Expressive Speech Synthesis for Storytelling: The INNOETICS' Entry to the Blizzard Challenge 2016

This paper describes INNOETICS' Speech Synthesis System entry for the Blizzard Challenge 2016, along with the corresponding results and some relevant discussion. We provide a description of the underlying system and techniques used in our TTS platform, as well as some detailed information regarding the voice building process. Based on the obtained results from the listening experiments, we atte...

متن کامل

Speech Database Speech Analysis Training of MSD - HSMM Excitation parameters Spectral parameters Speech signal Context - dependent MSD - HSMMs and duration models Speech Parameter Generation

This paper describes the text-to-speech synthesis system developed for the Blizzard Challenge 2016 by members of the ADAPT centre and colleagues from associated projects. The task was to build a synthetic voice for reading audiobooks to children, from a speech database of audiobooks around 5 hours long. Our entry system is an HMM-based parametric speech synthesizer which was built using a subse...

متن کامل

The CSTR entry to the Blizzard Challenge 2017

The annual Blizzard Challenge conducts side-by-side testing of a number of speech synthesis systems trained on a common set of speech data. Similar to 2016 Blizzard challenge, the task for this year is to train on expressively-read children’s story-books, and to synthesise speech in the same domain. The Challenge therefore presents an opportunity to investigate the effectiveness of several tech...

متن کامل

The RACAI Text-to-Speech Synthesis System

This paper describes the RACAI Text-to-Speech (TTS) entry for the Blizzard Challenge 2013. The development of the RACAI TTS started during the Metanet4U project and the system is currently part of the METASHARE platform. This paper describes the work carried out for preparing the RACAI entry during the Blizzard Challenge 2013 and provides a detailed description of our system and future developm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016