A Preliminary Work on Symptom Name Recognition from Free-Text Clinical Records of Traditional Chinese Medicine using Conditional Random Fields and Reasonable Features
نویسندگان
چکیده
A preliminary work on symptom name recognition from free-text clinical records (FCRs) of traditional Chinese medicine (TCM) is depicted in this paper. This problem is viewed as labeling each character in FCRs of TCM with a pre-defined tag (“B-SYC”, “I-SYC” or “OSYC”) to indicate the character’s role (a beginning, inside or outside part of a symptom name). The task is handled by Conditional Random Fields (CRFs) based on two types of features. The symptom name recognition FMeasure can reach up to 62.829% with recognition rate 93.403% and recognition error rate 52.665% under our experiment settings. The feasibility and effectiveness of the methods and reasonable features are verified, and several interesting and helpful results are shown. A detailed analysis for recognizing symptom names from FCRs of TCM is presented through analyzing labeling results of CRFs.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملSupervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: An empirical study
Clinical records of traditional Chinese medicine (TCM) are documented by TCM doctors during their routine diagnostic work. These records contain abundant knowledge and reflect the clinical experience of TCM doctors. In recent years, with the modernization of TCM clinical practice, these clinical records have begun to be digitized. Data mining (DM) and machine learning (ML) methods provide an op...
متن کاملDe-identification of health records using Anonym: Effectiveness and robustness across datasets
OBJECTIVE Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of...
متن کاملReleasing a Swedish Clinical Corpus after Removing all Words – De-identification Experiments with Conditional Random Fields and Random Forests
Patient records contain valuable information in the form of both structured data and free text; however this information is sensitive since it can reveal the identity of patients. In order to allow new methods and techniques to be developed and evaluated on real world clinical data without revealing such sensitive information, researchers could be given access to de-identified records without p...
متن کاملChinese Word Segmentation based on Mixing Multiple Preprocessor and CRF
This paper describes the Chinese Word Segmenter for our participation in CIPSSIGHAN-2010 bake-off task of Chinese word segmentation. We formalize the tasks as sequence tagging problems, and implemented them using conditional random fields (CRFs) model. The system contains two modules: multiple preprocessor and basic segmenter. The basic segmenter is designed as a problem of character-based tagg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012