Hierarchical modeling of F0 contours for voice conversion
نویسندگان
چکیده
Voice conversion systems deal with the conversion of a speech signal to sound as if it was uttered by another speaker. The conversion of the spectral features has attracted a lot of research attention but the conversion of pitch, modeling the speakerdependent prosody, is often achieved by just controlling the F0 level and range. However, the detailed prosody, including different linguistic units at several distinct temporal scales, can carry a significant amount of speaker identity related information. This paper introduces a new method for the conversion of the prosody, using wavelets to decompose the pitch contour into ten temporal scales ranging from microprosody to the utterance level, which allows modeling the different timings of the prosody phenomena. The prosody conversion is carried out in the wavelet domain, using regression techniques originally developed for the spectral conversion of speech. The performance of the proposed prosody conversion method is evaluated within a real voice conversion system. The results for crossgender conversion indicate a significant improvement in naturalness when compared to the traditional approach of shifting and scaling the F0 to match the target speaker’s mean and variance.
منابع مشابه
Superpositional Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis
Statistical parametric speech synthesis technologies, such as HMM-based and DNN-based ones, gain special attention from researchers because of their ability in generating speech in various voice qualities and styles. In these methods, all acoustic parameters (except durational ones) are handled in a frame-by-frame manner, which is not appropriate for prosodic features. Although relation of adja...
متن کاملA Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components
We present a novel stochastic model of singing voice fundamental frequency (F0) contours for characterizing expressive dynamic components, such as vibrato and portamento. Although dynamic components can be important features for any singing voice applications, modeling and extracting these components from a raw F0 contour have yet to be accomplished. Therefore, we describe a process for generat...
متن کاملGenerative modeling of speech F0 contours
This paper introduces our ongoing work on generative modeling of speech fundamental frequency (F0) contours for estimating prosodic features from raw speech data. The present F0 contour model is formulated by translating the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration, into a probabilistic model described as a discrete-time stocha...
متن کاملThe use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion
In our previous work, we proposed a speaking-aid system converting electrolaryngeal speech (EL speech) to normal speech using a statistical voice conversion technique. The main weakness of our system is the difficulty of estimating natural contours of the fundamental frequency (F0) from EL speech including only built-in F0 contours. This paper proposes another speaking-aid system with an air-pr...
متن کاملVae-space: Deep Generative Model of Voice Fundamental Frequency Contours
Modeling the speech generation process can provide flexible and interpretable ways to generate intended synthetic speech. In this paper, we present a deep generative model of fundamental frequency (F0) contours of normal speech and singing voices. The generative model we propose in this paper 1) is able to accurately decompose an F0 contour into the sum of phrase and accent components of the Fu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014