linguistic corpus

نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027 فیلتر نتایج به سال:

SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature

2017

Behrouz Bokharaeian Alberto Díaz Nasrin Taghizadeh Hamidreza Chitsaz Ramyar Chavoshinejad

BACKGROUND Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negati...

متن کامل

Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources

Journal: :Computational Intelligence and Neuroscience 2016

متن کامل

Search for near-duplicate texts in the linguistic corpus VepKar

Journal: : 2023

Developers of linguistic corpora need to spot and eliminate text duplicates. An overview approaches searching for near-duplicate texts in various is presented this article. algorithm a program nearduplicate (based on the number common bigrams) have been developed. Experiments were carried out with from Veps Karelian Open Corpus VepKar. The found 100 pairs most similar offered them an expert, wh...

متن کامل

Development and structure of the VariaNTS corpus: A spoken Dutch corpus containing talker and linguistic variability

Journal: :Speech Communication 2021

Abstract Speech perception and spoken word recognition are not only affected by what is being said, but also who speaking. Currently, publicly available corpora of Dutch do offer a wide variety linguistic materials produced multiple talkers. The VariaNTS (Variatie in Nederlandse Taal en Sprekers) corpus that was developed to maximize both talker variability. It contains 1000 items from 11 subca...

متن کامل

Investigating speech style specific pronunciation variation in large spoken language corpora

2004

Christophe Van Bael Henk van den Heuvel Helmer Strik

In the past, linguistic research was typically conducted on relatively small datasets that were specifically designed for the research at hand. Whereas to date many large spoken language corpora have become available, the usefulness of these corpora is still not fully established in linguistic research. The research reported on in this paper was conducted to illustrate the potential of large mu...

متن کامل

Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora

2001

Elke Teich

There has been an increasing interest in recent years in the enrichment of natural language corpora in terms of annotation with explicit linguistic information. This interest manifests itself most prominently in two areas of linguistics: corpus linguistics and computational linguistics. For corpus linguistics, the long standing practice has been to work on raw, i.e., unannotated text. While raw...

متن کامل

Linguistic Structure Evolves to Match Meaning Structure

2011

Monica Tamariz

Quantitative analysis has usually highlighted the random nature of linguistic forms (Zipf, 1949). We zoom in on three structured samples of language (numerals; playing cards; and a corpus of artificial languages from Kirby, Cornish & Smith 2008) to quantitative explore and illustrate the idea that linguistic forms are nonrandom in that their structure reflects the structure of the meanings they...

متن کامل

Corpus Encoding Standard: SGML Guidelines for Encoding Linguistic Corpora

1998

Nancy Ide

The Corpus Encoding Standard (CES) is an application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language), conformant to the TEI Guidelines for Electronic Text Encoding and Interchange (Sperberg-McQueen and Burnard, 1994). It provides encoding conventions for linguistic corpora designed to be optimally suited for use in language engineer...

متن کامل

Tracing Linguistic Evidence of Cultural Diversity: A Corpus-based Study

Journal: :Journal of Namibian Studies : History Politics Culture 2023

Various factors, including culture, indigenous languages, and social norms, shape English being spoken written in a foreign context. To trace the influence of these various several registers/ genres can be investigated. However, newspaper register is most suitable as it close to everyday events study how language has undergone change. This seeks investigate factors on used context explores patt...

متن کامل

Linguistically Annotated Learner Corpora: Aspects of a Layered Linguistic Encoding and Standardized Representation

2009

Detmar Meurers Holger Wunsch

Linguistically annotated corpora that are stored in standardized digital form can be a valuable source of empirical insight. They can help verify linguistic generalizations and support the formulation of new hypotheses. The linguistic annotation of such corpora often is crucial for their effective exploration from a linguistic perspective. The annotation essentially serves as an index to the li...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید