Research Paper: BioTagger-GM: A Gene/Protein Name Recognition System

نویسندگان

  • Manabu Torii
  • Zhang-Zhi Hu
  • Cathy H. Wu
  • Hongfang Liu
چکیده

OBJECTIVES Biomedical named entity recognition (BNER) is a critical component in automated systems that mine biomedical knowledge in free text. Among different types of entities in the domain, gene/protein would be the most studied one for BNER. Our goal is to develop a gene/protein name recognition system BioTagger-GM that exploits rich information in terminology sources using powerful machine learning frameworks and system combination. DESIGN BioTagger-GM consists of four main components: (1) dictionary lookup-gene/protein names in BioThesaurus and biomedical terms in UMLS Metathesaurus are tagged in text, (2) machine learning-machine learning systems are trained using dictionary lookup results as one type of feature, (3) post-processing-heuristic rules are used to correct recognition errors, and (4) system combination-a voting scheme is used to combine recognition results from multiple systems. MEASUREMENTS The BioCreAtIvE II Gene Mention (GM) corpus was used to evaluate the proposed method. To test its general applicability, the method was also evaluated on the JNLPBA corpus modified for gene/protein name recognition. The performance of the systems was evaluated through cross-validation tests and measured using precision, recall, and F-Measure. RESULTS BioTagger-GM achieved an F-Measure of 0.8887 on the BioCreAtIvE II GM corpus, which is higher than that of the first-place system in the BioCreAtIvE II challenge. The applicability of the method was also confirmed on the modified JNLPBA corpus. CONCLUSION The results suggest that terminology sources, powerful machine learning frameworks, and system combination can be integrated to build an effective BNER system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PCR-mediated Expression of the Human GM-CSF Gene in Escherichia coli

Four exons of the human genomic GM-CSF gene were assembled together using gene splicing by overlap extension (SOE) method. The resulting nucleotide sequence was cloned in the pET23a(+) ex‌pression vector under the control of strong bacteriophage T7 transcription and translation signals. The construct obtained was Transferred into the E. coli strain, BL21(DE3) pLysS and IPTG was used for inducti...

متن کامل

Expression of a Chimeric Protein Containing the Catalytic Domain of Shiga-Like Toxin and Human Granulocyte Macrophage Colony-Stimulating Factor (hGM-CSF) in Escherichia coli and Its Recognition by Reciprocal Antibodies

Fusion of two genes at DNA level produces a single protein, known as a chimeric protein. Immunotoxins are chimeric proteins composed of specific cell targeting and cell killing moieties. Bacterial or plant toxins are commonly used as the killing moieties of the chimeric immunotoxins. In this investigation, the catalytic domain of Shiga-like toxin (A1) was fused to human granulocyte macrophage ...

متن کامل

PURIFICATION AND CHARACTERIZATION OF THE CLONED HUMAN GM-CSF GENE EXPRESSED IN ESCHERICHIA COLI

The human granulocyte-macrophage colony stimulation factor (hGM-CSF) gene was cloned in the pET 23a( +) expression vector under the control of strong bacteriophage T7 transcription and translation signals. The hGM-CSF gene was transferred into E. coli strainBL21 (DE3)pLysS andIPTG was used for induction of GM-CSF gene. Production of the target protein was obtained as revealed by ELISA and ...

متن کامل

Cloning, Expression and Purification of Truncated Chlamydia Trachomatis Outer Membrane Protein 2 (Omp2) and its Application in an ELISA Assay

Background: Although a simple and direct method does not exist for the detection of chlamydial infections, there are situations in which reliable serological tests, with sensi-tivity related to a specific antigen, could be helpful. Objective: The aim of this study was to clone the first 1100 bp of the C. trachomatis outer membrane protein 2 (omp2) gene in order to prepare a recombinant protein ...

متن کامل

Evaluating Protein Name Recognition: An Automatic Approach

In some domains, named entity recognition might be considered a solved problem. This does not hold for biological text mining, where protein and gene name recognition are still open research problems [4, 6]. In this paper, we compare two current approaches to the problem of protein name recognition, KeX [5] and Yapex [4]. Unlike manual evaluation which relies on domain experts’ judgement concer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 16 2  شماره 

صفحات  -

تاریخ انتشار 2009