Using Conditional Random Fields to Exploit Token Structure and Labels for Accurate Semantic Annotation

نویسندگان

  • Aman Goel
  • Craig A. Knoblock
  • Kristina Lerman
چکیده

Automatic semantic annotation of structured data enables unsupervised integration of data from heterogeneous sources but is difficult to perform accurately due to the presence of many numeric fields and proper-noun fields that do not allow reference-based approaches and the absence of natural language text that prevents the use of language-based approaches. In addition, several of these semantic types have multiple heterogeneous representations, while sharing syntactic structure with other types. In this work, we propose a new approach to use conditional random fields (CRFs) to perform semantic annotation of structured data that takes advantage of the structure and labels of the tokens for higher accuracy of field labeling, while still allowing the use of exact inference techniques. We compare our approach with a linear-CRF based model that only labels fields and also with a regular-expression based approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Structure within Data for Accurate Labeling using Conditional Random Fields

Automatically assigning semantic class labels such as WindSpeed, Flight Number and Address to data obtained from structured sources including databases or web pages is an important problem in data integration since it enables the researchers to identify the contents of these sources. Automatic semantic annotation is difficult because of the variety of formats used for each semantic type (e.g., ...

متن کامل

XML Document Transformation with Conditional Random Fields

We address the problem of structure mapping that arises in xml data exchange or xml document transformation. Our approach relies on xml annotation with semantic labels that describe local tree editions. We propose xml Conditional Random Fields (xcrfs), a framework for building conditional models for labeling xml documents. We equip xcrfs with efficient algorithms for inference and parameter est...

متن کامل

Sentence and Token Splitting Based On Conditional Random Fields

Natural language processing systems which deal with real-world documents require several low-level tasks such as splitting a text into its constituent sentences, and splitting each sentence into its constituent tokens. These basic text segmentation services are usually supplied by some preprocessor prior to linguistic analysis. While this task is often considered as unsophisticated clerical wor...

متن کامل

Toward the automatic extraction of knowledge of usable goods

Knowledge of usable goods (e.g., toothbrush is used to clean the teeth and treadmill is used for exercise) is ubiquitous and in constant demand. This study proposes semantic labels to capture aspects of knowledge of usable goods and builds a benchmark corpus, Usable Goods Corpus, to explore this new semantic labeling task. Our human annotation experiment shows that human annotators can generall...

متن کامل

Word Co-occurrence and Markov Random Fields for Improving Automatic Image Annotation

In this paper a novel approach for improving automatic image annotation methods is proposed. The approach is based on the fact that accuracy of current image annotation methods is low if we look at the most confident label only. Instead, accuracy is improved if we look for the correct label within the set of the top−k candidate labels. We take advantage of this fact and propose a Markov random ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011