text database

A scoping review of medical professionalism research published in the Chinese language

2016

Xin Wang Julie Shih Fen-Ju Kuo Ming-Jung Ho

BACKGROUND The Chinese Medical Doctors Association (CMDA) adopted the Charter of Medical Professionalism in the New Millennium (Charter) and published the Chinese Medical Doctor Declaration (Declaration). This is an important step to re-building medical professionalism in China at a time when the commercialization of health care has led to a decline in physician accountability and public trust ...

متن کامل

Consistency Learning and Multiple Rankings Combination for Text Retrieval

2007

Nikolay Jetchev

Text retrieval is one of the most basic tasks in the field of information retrieval. This paper deals with retrieving relevant documents for text-based queries from a database. Several different methods for retrieving text are explored, and show widely differing performance on different queries. It is shown how each of those methods may be improved through a “consistency learning” framework, wh...

متن کامل

Automatic Question Generation from Swedish Documents as a Tool for Information Extraction

2011

Kenneth Wilhelmsson

An implementation of automatic question generation (QG) from raw Swedish text is presented. QG is here chosen as an alternative to natural query systems where any query can be posed and no indication is given of whether the current text database includes the information sought for. The program builds on parsing with grammatical functions from which corresponding questions are generated and it i...

متن کامل

Research Statement — Dani Yogatama

2015

Dani Yogatama

I design algorithms for intelligent processing of natural language texts—for example, to extract factual information into a structured database (e.g., extracting headquarters locations, CEOs, and phone numbers of companies from text into a database) or to predict real-world events from text (e.g., scientific trends, disease outbreaks). These applications require models of text that scale to lar...

متن کامل

Integrating INQUERY with an RDBMS to Support Text Retrieval

Journal: :IEEE Data Eng. Bull. 1996

S. R. Vasanthakumar James P. Callan W. Bruce Croft

Information is a combination of structured data and unstructured data. Traditionally, relational database management systems (RDBMS) have been designed to handle structured data. IR systems can handle text (unstructured data) very well but are not designed to handle structured data. With present day information being a combination of structured and unstructured data, there is an increasing dema...

متن کامل

Text Search in an NFS-Proxy: A Case Study in Extensible File Systems

2005

Kristen LeFevre Kevin Roundy

This paper describes the design of an extensible 3-tiered semantic file system, backed by an existing extensible object-relational database. The system is designed to export the standard NFS interface, while providing indexing and query support for user-defined file types using the virtual directory abstraction. To illustrate the feasibility of the proposed architecture, we describe its impleme...

متن کامل

Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization

2017

Itsumi Saito Kyosuke Nishida Kugatsu Sadamitsu Kuniko Saito Junji Tomita

Social media texts, such as tweets from Twitter, contain many types of nonstandard tokens, and the number of normalization approaches for handling such noisy text has been increasing. We present a method for automatically extracting pairs of a variant word and its normal form from unsegmented text on the basis of a pair-wise similarity approach. We incorporated the acquired variant-normalizatio...

متن کامل

OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles

2016

Pierre Lison Jörg Tiedemann

We present a new major release of the OpenSubtitles collection of parallel corpora. The release is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages. The release also incorporates a number of enhancements in the preprocessing and alignment of the subtitles, such as the automatic correction of OCR erro...

متن کامل

Language-Independent Methods for Compiling Monolingual Lexical Data

2004

Christian Biemann Stefan Bordag Gerhard Heyer Uwe Quasthoff Christian Wolff

In this paper we describe a flexible, portable and languageindependent infrastructure for setting up large monolingual language corpora. The approach is based on collecting a large amount of monolingual text from various sources. The input data is processed on the basis of a sentence-based text segmentation algorithm. We describe the entry structure of the corpus database as well as various que...

متن کامل

A Combined Resource of Biomedical Terminology and its Statistics

2015

Tilia Ellendorff Adrian Van der Lek Lenz Furrer Fabio Rinaldi

In this paper, we present a large biomedical term resource automatically compiled from the terminology of a selection of biomedical databases. The resource has a very simple and intuitive format and therefore can be easily embedded into a system for biomedical text mining and used as a linguistic resource. It is continuously updated and a user interface makes it possible to compile a new term r...

متن کامل