An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News

نویسندگان

  • Niraj Shrestha
  • Ivan Vulic
  • Marie-Francine Moens
چکیده

We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we automatically detect document alignments between highly similar speech documents and corresponding written news stories that are easily obtainable from the Web; (2) we employ term expansion techniques commonly used in information retrieval to recover named entities that were initially missed by the speech transcriber. We show that our method is able to find named entities missing in the transcribed speech data, and additionally to correct incorrectly assigned named entity tags. Consequently, our novel approach improves state-of-the-art NER results from speech data both in terms of recall and precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Recognition in Broadcast News Using Similar Written Texts

We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we detect block alignments between highly similar blocks of the speech data and corresponding written news data that are easily obtainable from the Web, (2) we employ term expansion techniques commonly used in information retrieval to recover named ent...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Construction and Analysis of Japanese-English Broadcast News Corpus with Named Entity Tags

We are aiming to acquire named entity (NE) translation knowledge from nonparallel, content-aligned corpora, by utilizing NE extraction techniques. For this research, we are constructing a JapaneseEnglish broadcast news corpus with NE tags. The tags represent not only NE class information but also coreference information within the same monolingual document and between corresponding Japanese-Eng...

متن کامل

Summarization of Broadcast News Video through Link Analysis of Named Entities

This paper describes the use of connections between named entities for summarization of broadcast news. We first extract named entities from a transcript of a news story, and find related entities nearby. In the context of a query, a link graph of relevant entities is rendered in an interactive display, allowing the user to manipulate, browse and examine the components, including the ability to...

متن کامل

A Cluster-based Approach to Broadcast News

We present an approach to detection and tracking of topics in multilingual broadcast news based upon a dynamic clustering scheme. Our approach derives from a system used to filter Web searches from multiple sources, with extensions for pipelining document clusters, part-of-speech tagging and extraction of named entities for use in an extended similarity measure.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013