Context-based Term Disambiguation in Biomedical Literature
نویسندگان
چکیده
The huge volumes of unstructured texts available online drives the increasing need for automated techniques to analyze and extract knowledge from these repositories of information. Resolving the ambiguity in these texts is an important step for any following analysis tasks. In this paper, we present a new method for one type of ambiguity resolving -term disambiguation. The method is based on machine learning and can be viewed as a context-based classification approach. In our experiments we apply it to gene and protein name disambiguation. We have extensively evaluated our method using around 600,000 Medline abstracts and three different classifiers. The results show that our technique is effective in achieving impressive accuracy, precision, and recall rates, and outperforms the recently published results on this problem. The paper includes the details of the method and the experimental design. We plan to apply our technique to the general domain of word sense disambiguation in the future.
منابع مشابه
Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation
Word sense disambiguation helps identifying the proper sense of ambiguous words in text. With large terminologies such as the UMLS Metathesaurus ambiguities appear and highly effective disambiguation methods are required. Supervised learning algorithm methods are used as one of the approaches to perform disambiguation. Features extracted from the context of an ambiguous word are used to identif...
متن کاملSense-Based Biomedical Indexing and Retrieval
This paper tackles the problem of term ambiguity, especially for biomedical literature. We propose and evaluate two methods of Word Sense Disambiguation (WSD) for biomedical terms and integrate them to a sense-based document indexing and retrieval framework. Ambiguous biomedical terms in documents and queries are disambiguated using the Medical Subject Headings (MeSH) thesaurus and semantically...
متن کاملBiomedical Word Sense Disambiguation with Neural Word and Concept Embeddings
OF THESIS Biomedical Word Sense Disambiguation with Neural Word and Concept Embeddings Addressing ambiguity issues is an important step in natural language processing (NLP) pipelines designed for information extraction and knowledge discovery. This problem is also common in biomedicine where NLP applications have become indispensable to exploit latent information from biomedical literature and ...
متن کاملA Learning-Based Approach for Biomedical Word Sense Disambiguation
In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated insta...
متن کاملSemantic Relatedness for Biomedical Word Sense Disambiguation
This paper presents a graph-based method for all-word word sense disambiguation of biomedical texts using semantic relatedness as edge weight. Semantic relatedness is derived from a term-topic co-occurrence matrix. The sense inventory is generated by the MetaMap program. Word sense disambiguation is performed on a disambiguation graph via a vertex centrality measure. The proposed method achieve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006