text domain

Development of Bilingual Domain-Specific Ontology for Automatic Conceptual Indexing

2004

Natalia V. Loukachevitch Boris V. Dobrov

In the paper we describe development, means of evaluation and applications of Russian–English Sociopolitical Thesaurus specially developed as a linguistic resource for automatic text processing applications. The Sociopolitical domain is not a domain of social research but a broad domain of social relations including economic, political, military, cultural, sports and other subdomains. The knowl...

متن کامل

A Testbed for Cross-Dataset Analysis

2014

Tatiana Tommasi Tinne Tuytelaars

Despite the increasing interest towards domain adaptation and transfer learning techniques to generalize over image collections and overcome their biases, the visual community misses a large scale testbed for cross-dataset analysis. In this paper we discuss the challenges faced when aligning twelve existing image databases in a unique corpus, and we propose two cross-dataset setups that introdu...

متن کامل

A News Editorial Corpus for Mining Argumentation Strategies

2016

Khalid Al Khatib Henning Wachsmuth Johannes Kiesel Matthias Hagen Benno Stein

Many argumentative texts, and news editorials in particular, follow a specific strategy to persuade their readers of some opinion or attitude. This includes decisions such as when to tell an anecdote or where to support an assumption with statistics, which is reflected by the composition of different types of argumentative discourse units in a text. While several argument mining corpora have re...

متن کامل

Automatic expansion of abbreviations by using context and character information

Journal: :Inf. Process. Manage. 2004

Akira Terada Takenobu Tokunaga Hozumi Tanaka

Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain-specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation ex...

متن کامل

Introduction to multilingual corpus-based concatenative speech synthesis

2007

Filip Deprez Jan Odijk Jan De Moortel

This tutorial paper addresses foreign-language support in corpus-based concatenative text-to-speech systems. We give an overview of application domains where strictly monolingual speech synthesis is not sufficient and where multilingual text-to-speech is required or highly desirable. We describe two approaches to multilingual corpus-based speech synthesis: phoneme mapping on the one hand, and t...

متن کامل

Early Deletion of Fillers In Processing Conversational Speech

2006

Matthew Lease Mark Johnson

This paper evaluates the benefit of deleting fillers (e.g. you know, like) early in parsing conversational speech. Readability studies have shown that disfluencies (fillers and speech repairs) may be deleted from transcripts without compromising meaning (Jones et al., 2003), and deleting repairs prior to parsing has been shown to improve its accuracy (Charniak and Johnson, 2001). We explore whe...

متن کامل

Corpora for the Evaluation of Robust Speaker Recognition Systems

2016

Douglas E. Sturim Pedro A. Torres-Carrasquillo Joseph P. Campbell

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature survey of corpora used in s...

متن کامل

Log-hyperconvexity index and Bergman kernel

Journal: :International Journal of Mathematics 2022

We obtain a quantitative estimate of Bergman distance when [Formula: see text] is bounded domain with log-hyperconvexity index text], as well the text]-integrability kernel text].

متن کامل

A Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering

Journal: :CoRR 2017

Bin Bi Hao Ma

This paper proposes a novel neural machine reading model for open-domain question answering at scale. Existing machine comprehension models typically assume that a short piece of relevant text containing answers is already identified and given to the models, from which the models are designed to extract answers. This assumption, however, is not realistic for building a large-scale open-domain q...

متن کامل

Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges

2014

Andreas Holzinger Johannes Schantl Miriam Schroettner Christin Seifert Karin M. Verspoor

Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making – neither images nor multimedia data. However...

متن کامل