text domain

Automatically Learned vs. Hand-crafted Text Analysis Rules 1 Domain-speciic Text Analysis Figure 1: Output from a \management Succession" Text

1997

David Fisher Wendy Lehnert Donald E. Martella Jack Harper

As vast quantities of on-line text become available, there is an increasing need for systems that automatically analyze the conceptual content of natural language text. Systems that operate on narrowly de ned domains show promise, but require a di erent set of domainspeci c rules for each application. This paper describes CRYSTAL, a system that learns text analysis rules automatically from exam...

متن کامل

A Study on the Relation between Linguistics-oriented and Domain-specific Semantics

Journal: :CoRR 2010

He Tan

In this paper we dealt with the comparison and linking between lexical resources with domain knowledge provided by ontologies. It is one of the issues for the combination of the Semantic Web Ontologies and Text Mining. We investigated the relations between the linguisticsoriented and domain-specific semantics, by associating the GO biological process concepts to the FrameNet semantic frames. Th...

متن کامل

SRA: description of the SOLOMON system as used for MUC-5

1993

Chinatsu Aone Sharon Flank Douglas McKee Paul Krause

SRA used a language-independent, domain-independent, multipurpose text understanding system as the core of the M UC-5 system for extraction from English and Japanese joint venture texts, SRA 's NLP core systelll, SOLOMON, has been under development since 1986. It has been used for a variety of domains, and was aimed from the start to be language-independent, domain-independent, and application-...

متن کامل

Acquisition of Domain-specific Patterns for Single Document Summarization and Information Extraction

2015

Mian Du Roman Yangarber

Single-document summarization aims to reduce the size of a text document while preserving the most important information. Much work has been done on open-domain summarization. This paper presents an automatic way to mine domain-specific patterns from text documents. With a small amount of effort required for manual selection, these patterns can be used for domain-specific scenario-based documen...

متن کامل

Enhancing Medical Named Entity Recognition with Features Derived from Unsupervised Methods

2014

Maria Skeppstedt

Creating the annotated corpus for training a named entity recognition model is expensive, particularly in specialised domains, such as medicine, which require expert annotators. Moreover, a model trained on text from one medical sub-domain often shows a drop in performance when applied on texts from another sub-domain, and annotated text from this other sub-domain might be required. When incorp...

متن کامل

A Simplification-Translation-Restoration Framework for Cross-Domain SMT Applications

2012

Han-Bin Chen Hen-Hsen Huang Hsin-Hsi Chen Ching-Ting Tan

Integration of domain specific knowledge into a general purpose statistical machine translation (SMT) system poses challenges due to insufficient bilingual corpora. In this paper we propose a simplification-translation-restoration (STR) framework for domain adaptation in SMT by simplifying domain specific segments of a text. For an in-domain text, we identify the critical segments and modify th...

متن کامل

The effect of domain and text type on text prediction quality

2012

Suzan Verberne Antal van den Bosch Helmer Strik Lou Boves

Text prediction is the task of suggesting text while the user is typing. Its main aim is to reduce the number of keystrokes that are needed to type a text. In this paper, we address the influence of text type and domain differences on text prediction quality. By training and testing our text prediction algorithm on four different text types (Wikipedia, Twitter, transcriptions of conversational ...

متن کامل

Exploring Text Classification for Messy Data: An Industry Use Case for Domain-Specific Analytics

2016

Laura Kassner Bernhard Mitschang

Industrial enterprise data present classification problems which are different from those problems typically discussed in the scientific community – with larger amounts of classes and with domain-specific, often unstructured data. We address one such problem through an analytics environment which makes use of domain-specific knowledge. Companies are beginning to use analytics on large amounts o...

متن کامل

How Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives

2011

Bruno Cartoni Sandrine Zufferey Thomas Meyer Andrei Popescu-Belis

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general meas...

متن کامل

Neural Regularized Domain Adaptation for Chinese Word Segmentation

2017

Zuyi Bao Si Li Weiran Xu Sheng Gao

For Chinese word segmentation, the largescale annotated corpora mainly focus on newswire and only a handful of annotated data is available in other domains such as patents and literature. Considering the limited amount of annotated target domain data, it is a challenge for segmenters to learn domain-specific information while avoid getting over-fitted at the same time. In this paper, we propose...

متن کامل