Annotation of all coreference in biomedical text: Guideline selection and adaptation
نویسندگان
چکیده
This paper describes an effort to build a corpus of full-text journal articles in which every co-referring noun phrase is annotated. The identity and appositive relations were marked up. Several annotation schemas were evaluated and are described here; the OntoNotes guidelines were selected. Biomedical journal articles required a number of adaptations to the OntoNotes guidelines—mainly doing away with the notion of generics, which also had implications for the handling of nominal modifiers. Domain experts and linguists were evaluated with respect to their ability to function as annotators, and both were found to be effective. Progress is reported with about one third of the project done; inter-annotator agreement at this stage is 0.684 by the MUC metric.
منابع مشابه
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text
Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of coreference resolution for biomedical text analysis applications has increasingly been acknowledge...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملAnnotation of Coreference Relations Among Linguistic Expressions and Images in Biological Articles
In this paper, we propose an annotation scheme which can be used not only for annotating coreference relations between linguistic expressions, but also those among linguistic expressions and images, in scientific texts such as biomedical articles. Images in biomedical domain often contain important information for analyses and diagnoses, and we consider that linking images to textual descriptio...
متن کاملWhat Is Coreference, And What Should Coreference Annotation Be?
In this paper, it is argued that 'coreference an-notation', as currently performed in the MUC community, goes well beyond annotation of the relation of coreference as it is commonly understood. As a result, it is not always clear what semantic relation these annotations are actually encoding. The paper discusses a number of interrelated problems with coreference annotation and concludes that re...
متن کاملDomain Adaptation with Active Learning for Coreference Resolution
In the literature, most prior work on coreference resolution centered on the newswire domain. Although a coreference resolution system trained on the newswire domain performs well on newswire texts, there is a huge performance drop when it is applied to the biomedical domain. In this paper, we present an approach integrating domain adaptation with active learning to adapt coreference resolution...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010