Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference

نویسندگان

  • Andrew McCallum
  • Ben Wellner
چکیده

Coreference analysis, also known as record linkage or identity uncertainty, is a difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces several discriminative, conditionalprobability models for coreference analysis, all examples of undirected graphical models. Unlike many historical approaches to coreference, the models presented here are relational—they do not assume that pairwise coreference decisions should be made independently from each other. Unlike other relational models of coreference that are generative, the conditional model here can incorporate a great variety of features of the input without having to be concerned about their dependencies— paralleling the advantages of conditional random fields over hidden Markov models. We present experiments on proper noun coreference in two text data sets, showing results in which we reduce error by nearly 28% or more over traditional thresholded record-linkage, and by up to 33% over an alternative coreference technique previously used in natural language processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conditional Models of Identity Uncertainty with Application to Noun Coreference

Coreference analysis, also known as record linkage or identity uncertainty, is a difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces several discriminative, conditional-probability models for coreference analysis, all examples of undirected graphical models. Unlike many historical approaches to coreference, the...

متن کامل

Object Consolodation by Graph Partitioning with a Conditionally-Trained Distance Metric

Coreference analysis, also known as record linkage, object consolidation or identity uncertainty, is a difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces several discriminative, conditional-probability models for coreference analysis, all examples of undirected graphical models. Unlike many historical approach...

متن کامل

Revisiting the Effects of Growth Uncertainty on Inflation in Iran:An Application of GARCH-in-Mean Models

This paper investigates the relationship between inflation and growth uncertainty in Iran for the period of 1988-2008 by using quarterly data. We employ Generalized Autoregressive Conditional Heteroscedasticity in Mean (GARCH-M) model to estimate time-varying conditional residual variance of growth, as a standard measures of growth uncertainty. The empirical evidence shows that growth uncertain...

متن کامل

Fuzzy Coreference Resolution for Summarization

We present a fuzzy-theory based approach to coreference resolution and its application to text summarization. Automatic determination of coreference between noun phrases is fraught with uncertainty. We show how fuzzy sets can be used to design a new coreference algorithm which captures this uncertainty in an explicit way and allows us to define varying degrees of coreference. The algorithm is e...

متن کامل

An Integrated, Conditional Model of Information Extraction and Coreference with Application to Citation Matching

Although information extraction and coreference resolution appear together in many applications, most current systems perform them as independent steps. This paper describes an approach to integrated inference for extraction and coreference based on conditionally-trained undirected graphical models. We discuss the advantages of conditional probability training, and of a coreference model struct...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003