نتایج جستجو برای: text domain

تعداد نتایج: 558891  

1997
David Fisher Wendy Lehnert Donald E. Martella Jack Harper

As vast quantities of on-line text become available, there is an increasing need for systems that automatically analyze the conceptual content of natural language text. Systems that operate on narrowly de ned domains show promise, but require a di erent set of domainspeci c rules for each application. This paper describes CRYSTAL, a system that learns text analysis rules automatically from exam...

Journal: :CoRR 2010
He Tan

In this paper we dealt with the comparison and linking between lexical resources with domain knowledge provided by ontologies. It is one of the issues for the combination of the Semantic Web Ontologies and Text Mining. We investigated the relations between the linguisticsoriented and domain-specific semantics, by associating the GO biological process concepts to the FrameNet semantic frames. Th...

1993
Chinatsu Aone Sharon Flank Douglas McKee Paul Krause

SRA used a language-independent, domain-independent, multipurpose text understanding system as the core of the M UC-5 system for extraction from English and Japanese joint venture texts, SRA 's NLP core systelll, SOLOMON, has been under development since 1986. It has been used for a variety of domains, and was aimed from the start to be language-independent, domain-independent, and application-...

2015
Mian Du Roman Yangarber

Single-document summarization aims to reduce the size of a text document while preserving the most important information. Much work has been done on open-domain summarization. This paper presents an automatic way to mine domain-specific patterns from text documents. With a small amount of effort required for manual selection, these patterns can be used for domain-specific scenario-based documen...

2014
Maria Skeppstedt

Creating the annotated corpus for training a named entity recognition model is expensive, particularly in specialised domains, such as medicine, which require expert annotators. Moreover, a model trained on text from one medical sub-domain often shows a drop in performance when applied on texts from another sub-domain, and annotated text from this other sub-domain might be required. When incorp...

2012
Han-Bin Chen Hen-Hsen Huang Hsin-Hsi Chen Ching-Ting Tan

Integration of domain specific knowledge into a general purpose statistical machine translation (SMT) system poses challenges due to insufficient bilingual corpora. In this paper we propose a simplification-translation-restoration (STR) framework for domain adaptation in SMT by simplifying domain specific segments of a text. For an in-domain text, we identify the critical segments and modify th...

2012
Suzan Verberne Antal van den Bosch Helmer Strik Lou Boves

Text prediction is the task of suggesting text while the user is typing. Its main aim is to reduce the number of keystrokes that are needed to type a text. In this paper, we address the influence of text type and domain differences on text prediction quality. By training and testing our text prediction algorithm on four different text types (Wikipedia, Twitter, transcriptions of conversational ...

2016
Laura Kassner Bernhard Mitschang

Industrial enterprise data present classification problems which are different from those problems typically discussed in the scientific community – with larger amounts of classes and with domain-specific, often unstructured data. We address one such problem through an analytics environment which makes use of domain-specific knowledge. Companies are beginning to use analytics on large amounts o...

2011
Bruno Cartoni Sandrine Zufferey Thomas Meyer Andrei Popescu-Belis

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general meas...

2017
Zuyi Bao Si Li Weiran Xu Sheng Gao

For Chinese word segmentation, the largescale annotated corpora mainly focus on newswire and only a handful of annotated data is available in other domains such as patents and literature. Considering the limited amount of annotated target domain data, it is a challenge for segmenters to learn domain-specific information while avoid getting over-fitted at the same time. In this paper, we propose...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید