Chemical Entity Recognition and Resolution to ChEBI
نویسندگان
چکیده
Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2-5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks.
منابع مشابه
Identifying Chemical Entities based on ChEBI
This software demonstration paper presents Identifying Chemical Entities (ICE), a platform composed by algorithms for chemical entity recognition, entity resolution to a reference database, namely ChEBI, and validation using chemical semantic similarity. It aims to provide the users with an improved display of entity recognition results, exposing outliers which are possible recognition errors a...
متن کاملLASIGE: using Conditional Random Fields and ChEBI ontology
For participating in the SemEval 2013 challenge of recognition and classification of drug names, we adapted our chemical entity recognition approach consisting in Conditional Random Fields for recognizing chemical terms and lexical similarity for entity resolution to the ChEBI ontology. We obtained promising results, with a best F-measure of 0.81 for the partial matching task when using post-pr...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملThe ChEBI Ontology: An Ontology for Chemistry within a Biological Context
Chemical Entities of Biological Interest (ChEBI) is a freely available database of molecular entities and chemical concepts, which is manually annotated to a high standard of quality and is non-redundant, thus differentiating it from other publicly available chemistry resources. It focuses specifically on those chemical entities which are of interest to the life sciences community, including me...
متن کاملChemical compound and drug name recognition using CRFs and semantic similarity based on ChEBI
This document presents our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task). We developed a system based on Conditional Random Fields for recognizing chemical terms, and on ChEBI resolution and semantic similarity techniques for validating the recognition results. Our system created multiple classifiers according to different training data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2012 شماره
صفحات -
تاریخ انتشار 2012