Exploring Word Embedding for Drug Name Recognition
نویسندگان
چکیده
This paper describes a machine learningbased approach that uses word embedding features to recognize drug names from biomedical texts. As a starting point, we developed a baseline system based on Conditional Random Field (CRF) trained with standard features used in current Named Entity Recognition (NER) systems. Then, the system was extended to incorporate new features, such as word vectors and word clusters generated by the Word2Vec tool and a lexicon feature from the DINTO ontology. We trained the Word2vec tool over two different corpus: Wikipedia and MedLine. Our main goal is to study the effectiveness of using word embeddings as features to improve performance on our baseline system, as well as to analyze whether the DINTO ontology could be a valuable complementary data source integrated in a machine learning NER system. To evaluate our approach and compare it with previous work, we conducted a series of experiments on the dataset of SemEval-2013 Task 9.1 Drug Name Recognition.
منابع مشابه
A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text
One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text is special and has unique characteristics. In addition, the medical text mining poses more challenges, e.g., more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug. The mining...
متن کاملLSTM-CRF for Drug-Named Entity Recognition
Drug-Named Entity Recognition (DNER) for biomedical literature is a fundamental facilitator of Information Extraction. For this reason, the DDIExtraction2011 (DDI2011) and DDIExtraction2013 (DDI2013) challenge introduced one task aiming at recognition of drug names. State-of-the-art DNER approaches heavily rely on hand-engineered features and domain-specific knowledge which are difficult to col...
متن کاملRecurrent neural networks with specialized word embedding for Chinese Clinical Named Entity Recognition
To extract medical clinical related entity mention from patient clinical records is an essential step in clinical research. Recently, many researchers employ neural architecture to tackle the similar task of clinical concept extraction or drug name recognition from English clinical records, and have got prominent progress. However, most previous systems on Chinese Clinical Named Entity Recognit...
متن کاملA New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text
One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text mining poses more challenges, for example, more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug, the lack of labeled dataset sources and external knowledge, and the multiple...
متن کاملAcquisition of cleft structures in L1 and L2
The present study aims at exploring the processing difficulty of cleft structures as a type of relative clause for EFL and Persian as first language learners.The impact of head nouns with various functions as well as that of embedding on the processing of Persian and English cleft structures has been investigated in the present study.The participants were 68 Iranian male and female students...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015