Revealing the Structure of Medical Dictations with Conditional Random Fields
نویسندگان
چکیده
Automatic processing of medical dictations poses a significant challenge. We approach the problem by introducing a statistical framework capable of identifying types and boundaries of sections, lists and other structures occurring in a dictation, thereby gaining explicit knowledge about the function of such elements. Training data is created semiautomatically by aligning a parallel corpus of corrected medical reports and corresponding transcripts generated via automatic speech recognition. We highlight the properties of our statistical framework, which is based on conditional random fields (CRFs) and implemented as an efficient, publicly available toolkit. Finally, we show that our approach is effective both under ideal conditions and for real-life dictation involving speech recognition errors and speech-related phenomena such as hesitation and repetitions.
منابع مشابه
Conditional Random Fields for Airborne Lidar Point Cloud Classification in Urban Area
Over the past decades, urban growth has been known as a worldwide phenomenon that includes widening process and expanding pattern. While the cities are changing rapidly, their quantitative analysis as well as decision making in urban planning can benefit from two-dimensional (2D) and three-dimensional (3D) digital models. The recent developments in imaging and non-imaging sensor technologies, s...
متن کاملCONDITIONAL EXPECTATION IN THE KOPKA'S D-POSETS
The notion of a $D$-poset was introduced in a connection withquantum mechanical models. In this paper, we introduce theconditional expectation of random variables on theK^{o}pka's $D$-Poset and prove the basic properties ofconditional expectation on this structure.
متن کاملConditional Random Fields for image segmentation in Intravascular Ultrasound
We present a Conditional Random Fields based approach for segmenting Intravascular Ultrasound (IVUS) images. The presented method uses a contextual discriminative graphical model to deal with the presence of distortions and artifacts in IVUS images, that turns the segmentation of interesting regions into a difficult task. An accurate lumen segmentation on IVUS longitudinal images is achieved.
متن کاملModeling Filled Pauses in Medical Dictations
Filled pauses are characteristic of spontaneous speech and can present considerable problems for speech recognition by being often recognized as short words. An um can be recognized as thumb or arm if the recognizer's language model does not adequately represent FP's. Recognition of quasi-spontaneous speech (medical dictation) is subject to this problem as well. Results from medical dictations ...
متن کاملReleasing a Swedish Clinical Corpus after Removing all Words – De-identification Experiments with Conditional Random Fields and Random Forests
Patient records contain valuable information in the form of both structured data and free text; however this information is sensitive since it can reveal the identity of patients. In order to allow new methods and techniques to be developed and evaluated on real world clinical data without revealing such sensitive information, researchers could be given access to de-identified records without p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008