نتایج جستجو برای: linguistic corpus
تعداد نتایج: 113027 فیلتر نتایج به سال:
BACKGROUND Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negati...
Developers of linguistic corpora need to spot and eliminate text duplicates. An overview approaches searching for near-duplicate texts in various is presented this article. algorithm a program nearduplicate (based on the number common bigrams) have been developed. Experiments were carried out with from Veps Karelian Open Corpus VepKar. The found 100 pairs most similar offered them an expert, wh...
Abstract Speech perception and spoken word recognition are not only affected by what is being said, but also who speaking. Currently, publicly available corpora of Dutch do offer a wide variety linguistic materials produced multiple talkers. The VariaNTS (Variatie in Nederlandse Taal en Sprekers) corpus that was developed to maximize both talker variability. It contains 1000 items from 11 subca...
In the past, linguistic research was typically conducted on relatively small datasets that were specifically designed for the research at hand. Whereas to date many large spoken language corpora have become available, the usefulness of these corpora is still not fully established in linguistic research. The research reported on in this paper was conducted to illustrate the potential of large mu...
There has been an increasing interest in recent years in the enrichment of natural language corpora in terms of annotation with explicit linguistic information. This interest manifests itself most prominently in two areas of linguistics: corpus linguistics and computational linguistics. For corpus linguistics, the long standing practice has been to work on raw, i.e., unannotated text. While raw...
Quantitative analysis has usually highlighted the random nature of linguistic forms (Zipf, 1949). We zoom in on three structured samples of language (numerals; playing cards; and a corpus of artificial languages from Kirby, Cornish & Smith 2008) to quantitative explore and illustrate the idea that linguistic forms are nonrandom in that their structure reflects the structure of the meanings they...
The Corpus Encoding Standard (CES) is an application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language), conformant to the TEI Guidelines for Electronic Text Encoding and Interchange (Sperberg-McQueen and Burnard, 1994). It provides encoding conventions for linguistic corpora designed to be optimally suited for use in language engineer...
Various factors, including culture, indigenous languages, and social norms, shape English being spoken written in a foreign context. To trace the influence of these various several registers/ genres can be investigated. However, newspaper register is most suitable as it close to everyday events study how language has undergone change. This seeks investigate factors on used context explores patt...
Linguistically annotated corpora that are stored in standardized digital form can be a valuable source of empirical insight. They can help verify linguistic generalizations and support the formulation of new hypotheses. The linguistic annotation of such corpora often is crucial for their effective exploration from a linguistic perspective. The annotation essentially serves as an index to the li...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید