نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

2012
Andrew Rosenberg

The corpus is an invaluable resource in Spoken and Natural Language Processing. Consistent data sets have allowed for empirical evaluation of competing algorithms. The sharing of high-quality annotated linguistic data has enabled participation and experimentation by a wide range of researchers. However, despite dubbing these annotations as " gold-standard " , many corpora contain labeling error...

2010
Lun-Wei Ku Ting-Hao Huang Hsin-Hsi Chen

In this paper, we base on the syntactic structural Chinese Treebank corpus, construct the Chinese Opinon Treebank for the research of opinion analysis. We introduce the tagging scheme and develop a tagging tool for constructing this corpus. Annotated samples are described. Information including opinions (yes or no), their polarities (positive, neutral or negative), types (expression, status, or...

2012
Ismaïl El Maarouf Jeanne Villaneau

Fairy tales, folktales and more generally children stories have lately attracted the Natural Language Processing (NLP) community. As such, very few corpora exist and linguistic resources are lacking. The work presented in this paper aims at filling this gap by presenting a syntactically and semantically annotated corpus. It focuses on the linguistic analysis of a Fairy Tales Corpus, and provide...

Writers ensure the expected (re)construction of concepts in the mind of the audience through effective signposting of the inevitably linear linguistic stream, which is a main aspect of metadiscourse called interactive metadiscourse. In this study, the use of interactive metadiscourse markers in research articles was investigated to examine any probable difference existing between different disc...

Journal: :Research in corpus linguistics 2021

Corpus Linguistics has proved of great value as a methodological tool in shedding light on how discourse is constructed different text types. This opening contribution to the special issue “Corpus-linguistic perspectives textual variation” provides an account some most common applications Linguistics, describes widely used corpora, and pins down influential corpus-based research works. In so do...

2016
Susanne Haaf

This paper poses the question, how linguistic corpus-based research may be enriched by the exploitation of conceptual text structures and layout as provided via TEI annotation. Examples for possible areas of research and usage scenarios are provided based on the German historical corpus of the Deutsches Textarchiv (DTA) project, which has been consistently tagged accordant to the TEI Guidelines...

Research Article (RA) genre has been a significant area of research in academic writing over past decades. However, authors’ identity in RAs has not received much attention, especially in soft sciences like applied linguistics. This paper reports a corpus analysis of Iranian writers’ authorial presence markers in RAs in the field of applied linguistics. The corpus comprised 30 RAs (200,000 word...

2005
Silvia Hansen Elke Teich

The present paper discusses an application of multilingual, multi-layer corpus analysis from translation studies. The concrete context is the empirical testing of hypotheses about the specific properties of translations, such as explicitation, simplification, sanitization or normalization. While some of these assumed properties can be tested using some rather shallow measures that operate at th...

2006
Silvia Hansen-Schirra Stella Neumann Mihaela Vela

This paper presents the compilation of the CroCo Corpus, an English-German translation corpus. Corpus design, annotation and alignment are described in detail. In order to guarantee the searchability and exchangeability of the corpus, XML stand-off mark-up is used as representation format for the multi-layer annotation. On this basis it is shown how the corpus can be queried using XQuery. Furth...

2004
Jörg Tiedemann Lars Nygaard

The OPUS corpus is a growing collection of translated documents collected from the internet. The current version contains about 30 million words in 60 languages. The entire corpus is sentence aligned and it also contains linguistic markup for certain languages.

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید