Towards High Performance Multilingual Event Extraction: Language Specific Issue and Feature Exploration

نویسندگان

  • Zheng Chen
  • Heng Ji
چکیده

We present an Information Extraction (IE) system that combines traditional IE techniques and cross-document event ranking. Our final goal is to provide the user with ranked events upon a query. The first step is to set up a trainable event extraction engine to glean all possible events from multiple documents. The second step is to construct our data warehouse with refined events. The third step is to build an IR-like engine to produce ranked events upon a user’s query. As the first step, we developed a multilingual event extraction engine using a modularized approach. In this paper, we focus on Chinese event extraction. We point out a language specific issue in Chinese trigger labeling, and then commit to discussing the contributions of lexical, syntactic and semantic features applied in trigger labeling and argument labeling tasks. As a result, we achieved high performance comparable to state-of-the-art English event extraction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Specific Issue and Feature Exploration in Chinese Event Extraction

In this paper, we present a Chinese event extraction system. We point out a language specific issue in Chinese trigger labeling, and then commit to discussing the contributions of lexical, syntactic and semantic features applied in trigger labeling and argument labeling. As a result, we achieved competitive performance, specifically, F-measure of 59.9 in trigger labeling and F-measure of 43.8 i...

متن کامل

Towards Multilingual Event Extraction Evaluation: A Case Study for the Czech Language

This paper presents a multilingual corpus of news, annotated with event metadata information. The events in our corpus are from the domain of violence, natural and man made disasters. The main goal of the corpus is automatic evaluation of event detection and extraction systems in different languages. As a use case, we take a rulebased event extraction system, extend it to cover a new language, ...

متن کامل

Towards High Performance Phonotactic Feature for Spoken Language Recognition

With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language tr...

متن کامل

Leveraging Multilingual Training for Limited Resource Event Extraction

Event extraction has become one of the most important topics in information extraction, but to date, there is very limited work on leveraging cross-lingual training to boost performance. We propose a new event extraction approach that trains on multiple languages using a combination of both language-dependent and language-independent features, with particular focus on the case where target doma...

متن کامل

Multilingual Extraction Ontologies

The growth of multilingual web content and increasing internationalization portends the need for cross-language query processing. We offer ML-OntoES (a MultiLingual Ontology-based Extraction System) as a solution for narrowdomain/data-rich applications. Based on language-independent extraction ontologies (Embley, Liddle, & Lonsdale, 2011), ML-OntoES enables semantic search over domain-specific,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009