BREF, a large vocabulary spoken corpus for French
نویسندگان
چکیده
This paper presents some of the design considerations of BREF, a large read-speech corpus for French. BREF was designed to provide continuous speech data for the development of dictation machines, for the evaluation of continuous speech recognition systems (both speaker-dependent and speakerindependent), and for the study of phonological variations. The texts to be read were selected from 5 million words of the French newspaper, Le Monde. In total, 11,000 texts were selected, with selection criteria that emphasisized maximizing the number of distinct triphones. Separate text materials were selected for training and test corpora. Ninety speakers have been recorded, each providing between 5,000 and 10,000 words (approximately 40-70 min.) of speech.
منابع مشابه
Issues in Large Vocabulary, Multilingual Speech Recognition
In this paper we report on our activities in multilingual, speaker-independent,large vocabulary continuous speech recognition. The multilingual aspect of this work is of particular importance in Eu-rope, where each country has its own national language. Our existing recognizer for American English and French, has been ported to British English and German. It has been assessed in the context of ...
متن کاملBesoins Lexicaux A La Lumiere De L'Analyse Statistique Du Corpus De Textes Du Projet "Bref" - Le Lexique "BDLEX" Du Francais Ecrit Et Oral
In this paper, we describe lexical needs for spoken and written French surface processing, like automatic text correction, speech recognition and synthesis. We present statistical observations made on a vocabulary compiled from real texts like articles. These texts have been used for building a recorded speech database called BREF. Machine Communication), this database is intended for dictation...
متن کاملAdaptation Experiments on French Mediaparl Asr
This document summarizes adaptation experiments done on French MediaParl corpus and other French corpora. Baseline adaptation techniques are briefly presented and evaluated in the MediaParl task for speaker adaptation, speaker adaptive training, database combination and environmental adaptation. Results show that by applying baseline adaptation techniques, a relative WER reduction of up to 22.8...
متن کاملSpeaker-Independent Phone Recognition Using BREF
A series of experiments on speaker-independent phone recognition of continuous speech have been carried out using the recently recorded BREF corpus. These experiments are the first to use this large corpus, and are meant to provide a baseline performance evaluation for vocabulary-independent phone recognition of French. The HMM-based recognizer was trained with hand-verified data from 43 speake...
متن کاملContinuous speech dictation in French
A major research activity at LIMSI is multilingual, speaker-independent, large vocabulary speech dictation. In this paper we report on efforts in large vocabulary, speaker-independent continuous speech recognition of French using the BREF corpus. Recognition experiments were carried out with vocabularies containing up to 20k words. The recognizer makes use of continuous density HMM with Gaussia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1991