نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

2017
Behrouz Bokharaeian Alberto Díaz Nasrin Taghizadeh Hamidreza Chitsaz Ramyar Chavoshinejad

BACKGROUND Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negati...

Journal: : 2023

Developers of linguistic corpora need to spot and eliminate text duplicates. An overview approaches searching for near-duplicate texts in various is presented this article. algorithm a program nearduplicate (based on the number common bigrams) have been developed. Experiments were carried out with from Veps Karelian Open Corpus VepKar. The found 100 pairs most similar offered them an expert, wh...

Journal: :Speech Communication 2021

Abstract Speech perception and spoken word recognition are not only affected by what is being said, but also who speaking. Currently, publicly available corpora of Dutch do offer a wide variety linguistic materials produced multiple talkers. The VariaNTS (Variatie in Nederlandse Taal en Sprekers) corpus that was developed to maximize both talker variability. It contains 1000 items from 11 subca...

2004
Christophe Van Bael Henk van den Heuvel Helmer Strik

In the past, linguistic research was typically conducted on relatively small datasets that were specifically designed for the research at hand. Whereas to date many large spoken language corpora have become available, the usefulness of these corpora is still not fully established in linguistic research. The research reported on in this paper was conducted to illustrate the potential of large mu...

2001
Elke Teich

There has been an increasing interest in recent years in the enrichment of natural language corpora in terms of annotation with explicit linguistic information. This interest manifests itself most prominently in two areas of linguistics: corpus linguistics and computational linguistics. For corpus linguistics, the long standing practice has been to work on raw, i.e., unannotated text. While raw...

2011
Monica Tamariz

Quantitative analysis has usually highlighted the random nature of linguistic forms (Zipf, 1949). We zoom in on three structured samples of language (numerals; playing cards; and a corpus of artificial languages from Kirby, Cornish & Smith 2008) to quantitative explore and illustrate the idea that linguistic forms are nonrandom in that their structure reflects the structure of the meanings they...

1998
Nancy Ide

The Corpus Encoding Standard (CES) is an application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language), conformant to the TEI Guidelines for Electronic Text Encoding and Interchange (Sperberg-McQueen and Burnard, 1994). It provides encoding conventions for linguistic corpora designed to be optimally suited for use in language engineer...

Journal: :Journal of Namibian Studies : History Politics Culture 2023

Various factors, including culture, indigenous languages, and social norms, shape English being spoken written in a foreign context. To trace the influence of these various several registers/ genres can be investigated. However, newspaper register is most suitable as it close to everyday events study how language has undergone change. This seeks investigate factors on used context explores patt...

2009
Detmar Meurers Holger Wunsch

Linguistically annotated corpora that are stored in standardized digital form can be a valuable source of empirical insight. They can help verify linguistic generalizations and support the formulation of new hypotheses. The linguistic annotation of such corpora often is crucial for their effective exploration from a linguistic perspective. The annotation essentially serves as an index to the li...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید