Analysis of Word Embeddings and Sequence Features for Clinical Information Extraction

نویسندگان

  • Lance De Vine
  • Mahnoosh Kholghi
  • Guido Zuccon
  • Laurianne Sitbon
  • Anthony N. Nguyen
چکیده

This study investigates the use of unsupervised features derived from word embedding approaches and novel sequence representation approaches for improving clinical information extraction systems. Our results corroborate previous findings that indicate that the use of word embeddings significantly improve the effectiveness of concept extraction models; however, we further determine the influence that the corpora used to generate such features have. We also demonstrate the promise of sequence-based unsupervised features for further improving concept extraction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Benefits of Word Embeddings Features for Active Learning in Clinical Information Extraction

This study investigates the use of unsupervised word embeddings and sequence features for sample representation in an active learning framework built to extract clinical concepts from clinical free text. The objective is to further reduce the manual annotation effort while achieving higher effectiveness compared to a set of baseline features. Unsupervised features are derived from skip-gram wor...

متن کامل

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Background Neural word embeddings have been widely used in biomedical Natural Language Processing (NLP) applications as they provide vector representations of words capturing the semantic properties of words and the linguistic relationship between words. Many biomedical applications use different textual resources (e.g., Wikipedia and biomedical articles) to train word embeddings and apply thes...

متن کامل

Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing

We explore new methods of improving Curriculum Vitæ (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our approach integrates the word embeddings as input features for a probabilistic sequence labeling model that relies on the Conditional Random Field (CRF) framework. Best-performing word embeddings are gene...

متن کامل

Combining Word Embeddings and Feature Embeddings for Fine-grained Relation Extraction

Compositional embedding models build a representation for a linguistic structure based on its component word embeddings. While recent work has combined these word embeddings with hand crafted features for improved performance, it was restricted to a small number of features due to model complexity, thus limiting its applicability. We propose a new model that conjoins features and word embedding...

متن کامل

Convolutional Neural Network Based Semantic Tagging with Entity Embeddings

Unsupervised word embeddings provide rich linguistic and conceptual information about words. However, they may provide weak information about domain specific semantic relations for certain tasks such as semantic parsing of natural language queries, where such information about words or phrases can be valuable. To encode the prior knowledge about the semantic word relations, we extended the neur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015