On Coreference Resolution Performance Metrics
نویسنده
چکیده
The paper proposes a Constrained EntityAlignment F-Measure (CEAF) for evaluating coreference resolution. The metric is computed by aligning reference and system entities (or coreference chains) with the constraint that a system (reference) entity is aligned with at most one reference (system) entity. We show that the best alignment is a maximum bipartite matching problem which can be solved by the Kuhn-Munkres algorithm. Comparative experiments are conducted to show that the widelyknown MUC F-measure has serious flaws in evaluating a coreference system. The proposed metric is also compared with the ACE-Value, the official evaluation metric in the Automatic Content Extraction (ACE) task, and we conclude that the proposed metric possesses some properties such as symmetry and better interpretability missing in the ACE-Value.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملLinguistically Aware Coreference Evaluation Metrics
Virtually all the commonly-used evaluation metrics for entity coreference resolution are linguistically agnostic, treating the mentions to be clustered as generic rather than linguistic objects. We argue that the performance of an entity coreference resolver cannot be accurately reflected when it is evaluated using linguistically agnostic metrics. Consequently, we propose a framework for incorp...
متن کاملCritical Reflections on Evaluation Practices in Coreference Resolution
In this paper we revisit the task of quantitative evaluation of coreference resolution systems. We review the most commonly used metrics (MUC, B, CEAF and BLANC) on the basis of their evaluation of coreference resolution in five texts from the OntoNotes corpus. We examine both the correlation between the metrics and the degree to which our human judgement of coreference resolution agrees with t...
متن کاملInstance Sampling for Multilingual Coreference Resolution
In this paper we investigate the effect of downsampling negative training instances on a multilingual memory-based coreference resolution approach. We report results on the SemEval-2010 task 1 data sets for six different languages (Catalan, Dutch, English, German, Italian and Spanish) and for four evaluation metrics (MUC, B, CEAF, BLANC). Our experiments show that downsampling negative training...
متن کاملCoreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information
This paper explores the effect that different corpus configurations have on the performance of a coreference resolution system, as measured by MUC, B3, and CEAF. By varying separately three parameters (language, annotation scheme, and preprocessing information) and applying the same coreference resolution system, the strong bonds between system and corpus are demonstrated. The experiments revea...
متن کاملEvaluation Metrics For End-to-End Coreference Resolution Systems
Commonly used coreference resolution evaluation metrics can only be applied to key mentions, i.e. already annotated mentions. We here propose two variants of the B and CEAF coreference resolution evaluation algorithms which can be applied to coreference resolution systems dealing with system mentions, i.e. automatically determined mentions. Our experiments show that our variants lead to intuiti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005