Efficient Access to Lecture Audio Archives through Spoken Language Processing

نویسندگان

Tatsuya Kawahara

Tasuku Kitade

Kazuya Shitaoka

Hiroaki Nanjo

چکیده

The paper firstly addresses the current state of speech recognition using the “Corpus of Spontaneous Japanese (CSJ)”. It is shown that the large-scale corpus had strong impact in training acoustic and language models considering morphological and pronunciation variations which are characteristic to spontaneous Japanese. Unsupervised adaptation of these models and the speaking rate is also effective, and we obtained word accuracy of 78.0%. Then, an intelligent archiving system of lectures based on automatic transcription and indexing is introduced. Transcriptions are automatically edited for improving readability, and key sentences are indexed based on statistically-derived discourse markers and topic words. Thus, we realize efficient browsing of lecture audio archives.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The MIT Spoken Lecture Processing Project

We will demonstrate the MIT Spoken Lecture Processing Server and an accompanying lecture browser that students can use to quickly locate and browse lecture segments that apply to their query. We will show how lecturers can upload recorded lectures and companion text material to our server for automatic processing. The server automatically generates a time-aligned word transcript of the lecture ...

متن کامل

Analysis And Processing Of Lecture Audio Data: Preliminary Investigations

In this paper we report on our recent efforts to collect a corpus of spoken lecture material that will enable research directed towards fast, accurate, and easy access to lecture content. Thus far, we have collected a corpus of 270 hours of speech from a variety of undergraduate courses and seminars. We report on an initial analysis of the spontaneous speech phenomena present in these data and ...

متن کامل

A browsing system for classroom lecture speech

Developing technologies to summarize and retrieve huge quantities of spoken documents, recorded during classroom lectures, for the purpose of e-Learning or self-learning are important. In this paper, we describe an adaptation method of a language model to recognize keywords in given slides. Next, we propose a summarization method for spoken classroom lectures using prosodic features and linguis...

متن کامل

Recent progress in the MIT spoken lecture processing project

In this paper we discuss our research activities in the area of spoken lecture processing. Our goal is to improve the access to on-line audio/visual recordings of academic lectures by developing tools for the processing, transcription, indexing, segmentation, summarization, retrieval and browsing of this media. In this paper, we provide an overview of the technology components and systems that ...

متن کامل

Information Access in Large Spoken Archives

Digital archives have emerged as the pre-eminent method for capturing the human experience. Before such archives can be used efficiently, their contents must be described. The scale of such archives along with the associated content mark up cost make it impractical to provide access via purely manual means, but automatic technologies for search in spoken materials still have relatively limited ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Efficient Access to Lecture Audio Archives through Spoken Language Processing

نویسندگان

چکیده

منابع مشابه

The MIT Spoken Lecture Processing Project

Analysis And Processing Of Lecture Audio Data: Preliminary Investigations

A browsing system for classroom lecture speech

Recent progress in the MIT spoken lecture processing project

Information Access in Large Spoken Archives

عنوان ژورنال:

اشتراک گذاری