Matching Inconsistently Spelled Names in Automatic Speech Recognizer Output for Information Retrieval

نویسندگان

  • Hema Raghavan
  • James Allan
چکیده

Many proper names are spelled inconsistently in speech recognizer output, posing a problem for applications where locating mentions of named entities is critical. We model the distortion in the spelling of a name due to the speech recognizer as the effect of a noisy channel. The models follow the framework of the IBM translation models. The model is trained using a parallel text of closed caption and automatic speech recognition output. We also test a string edit distance based method. The effectiveness of these models is evaluated on a name query retrieval task. Our methods result in a 60% improvement in F1. We also demonstrate why the problem has not been critical in TREC and TDT tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robustness improvements in continuously spelled names over the telephone

A speaker-independent speech recognizer for continuously spelled names, implemented for a switchboard call-routing task, is analyzed for sources of error. Results indicate most errors are due to extraneous speech and end-point detection errors. Strategies are proposed for improving the robustness of recognition, including tolerance for speech with pauses, and a letter-spotting strategy to handl...

متن کامل

Integrating spelling into spoken dialogue recognition

Recognition of spelled letter sequences is essential for many real-world applications which involve arbitrary names or addresses. Often the letter sequences carry the sentence's crucial information; therefore, it is important to correctly localize and recognize the spelled string. However, large vocabulary speech recognizers tend to perform poorly on spelled letters, especially if they have to ...

متن کامل

Spanish recognizer of continuously spelled names over the telephone

In this paper we present a hypothesis-verification approach for a Spanish Recognizer of continuously spelled names over the telephone. We give a detailed description of the spelling task for Spanish where the most confusable letter sets are described. We introduce a new HMM topology with contextual silences incorporated into the letter model to deal with pauses between letters, increasing the L...

متن کامل

Speech recognition with automatic punctuation

We present a method of speech recognition with automatic punctuation based on a combination of acoustic and lexical evidence. In the recognizer vocabulary, punctuation marks are treated as word entries. By assigning the acoustic baseforms of silence, breath, and other non-speech sounds to punctuation marks, and using a properly processed N-gram language model, unpronounced punctuation marks of ...

متن کامل

An Isolated Letter Recognizer for Proper Name Identification Over the Telephone

Spelled letter recognition over the telephone line is essential for applications that involve names or addresses. In this paper we discuss the implementation and present results of a speaker independent spelled letter recognizer, trained and tested on the European project SPEECHDAT corpus. The system was implemented using HTK V2.0 (Hidden Markov Model Toolkit) software development tool and the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005