Techniques for accurate automatic annotation of speech waveforms

نویسندگان

  • Stephen Cox
  • Richard Brady
  • Peter Jackson
چکیده

We describe techniques used in the development of an automatic annotation system for use with a concatenative text-to-speech synthesis system. The goal of the system is to generate automatically from word-level transcriptions annotations that result in synthetic speech of the same quality as that produced from hand-labelled speech. Our approach in this work has been to use the standard technique of “forced-alignment” to each utterance and to refine both acoustic and pronunciation modelling to achieve greater alignment accuracy. Acoustic models were improved by Bayesian speaker adaptation and the use of confidence measures from N-Best decodings to produce speaker dependent HMMs. Pronunciation modelling improvements involved the use of a large pronunciation dictionary containing multiple pronunciations for many words, pronunciation probabilities, the accommodation of interword silences and using information derived from existing manual annotations to guide the recogniser during decoding. At present, the system can reliably produce time-aligned phonetic alignments for UK accents in which the automatic and manual alignments agree on the segmental labelling 93% of the time. It places boundaries with an r.m.s. error of 14.5 ms from the manual boundary. Subjectively, speech produced using automatic alignments is highly intelligible if not quite as good as that produced from manual alignments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Neighbor Voting for Automatic Image Annotation

With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...

متن کامل

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

A CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images

Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...

متن کامل

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

Semi-automatic labeling of the UCU accents speech corpus

The annotation and labeling of speech tasks in large multitask speech corpora is a necessary part of preparing a corpus for distribution. This paper addresses three approaches to annotation and labeling, namely manual, semi automatic and automatic procedures for labeling the UCU Accent Project speech data, at multilingual multitask longitudinal speech corpus. Accuracy and minimal time investmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998