Hierarchical modeling of F0 contours for voice conversion

نویسندگان

Gerard Sanchez

Hanna Silén

Jani Nurminen

Moncef Gabbouj

چکیده

Voice conversion systems deal with the conversion of a speech signal to sound as if it was uttered by another speaker. The conversion of the spectral features has attracted a lot of research attention but the conversion of pitch, modeling the speakerdependent prosody, is often achieved by just controlling the F0 level and range. However, the detailed prosody, including different linguistic units at several distinct temporal scales, can carry a significant amount of speaker identity related information. This paper introduces a new method for the conversion of the prosody, using wavelets to decompose the pitch contour into ten temporal scales ranging from microprosody to the utterance level, which allows modeling the different timings of the prosody phenomena. The prosody conversion is carried out in the wavelet domain, using regression techniques originally developed for the spectral conversion of speech. The performance of the proposed prosody conversion method is evaluated within a real voice conversion system. The results for crossgender conversion indicate a significant improvement in naturalness when compared to the traditional approach of shifting and scaling the F0 to match the target speaker’s mean and variance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Superpositional Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis

Statistical parametric speech synthesis technologies, such as HMM-based and DNN-based ones, gain special attention from researchers because of their ability in generating speech in various voice qualities and styles. In these methods, all acoustic parameters (except durational ones) are handled in a frame-by-frame manner, which is not appropriate for prosodic features. Although relation of adja...

متن کامل

A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components

We present a novel stochastic model of singing voice fundamental frequency (F0) contours for characterizing expressive dynamic components, such as vibrato and portamento. Although dynamic components can be important features for any singing voice applications, modeling and extracting these components from a raw F0 contour have yet to be accomplished. Therefore, we describe a process for generat...

متن کامل

Generative modeling of speech F0 contours

This paper introduces our ongoing work on generative modeling of speech fundamental frequency (F0) contours for estimating prosodic features from raw speech data. The present F0 contour model is formulated by translating the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration, into a probabilistic model described as a discrete-time stocha...

متن کامل

The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion

In our previous work, we proposed a speaking-aid system converting electrolaryngeal speech (EL speech) to normal speech using a statistical voice conversion technique. The main weakness of our system is the difficulty of estimating natural contours of the fundamental frequency (F0) from EL speech including only built-in F0 contours. This paper proposes another speaking-aid system with an air-pr...

متن کامل

Vae-space: Deep Generative Model of Voice Fundamental Frequency Contours

Modeling the speech generation process can provide flexible and interpretable ways to generate intended synthetic speech. In this paper, we present a deep generative model of fundamental frequency (F0) contours of normal speech and singing voices. The generative model we propose in this paper 1) is able to accurately decompose an F0 contour into the sum of phrase and accent components of the Fu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Hierarchical modeling of F0 contours for voice conversion

نویسندگان

چکیده

منابع مشابه

Superpositional Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis

A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components

Generative modeling of speech F0 contours

The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion

Vae-space: Deep Generative Model of Voice Fundamental Frequency Contours

عنوان ژورنال:

اشتراک گذاری