نتایج جستجو برای: corpora creation

تعداد نتایج: 147847  

2012
Elizaveta Loginova Anita Gojun Helena Blancafort Marie Guégan Tatiana Gornostay Ulrich Heid

In this paper, we discuss practical and methodological issues of the creation of reference term lists (RTLs) for the evaluation of monolingual and bilingual term candidate extraction from comparable corpora in the domains of wind energy and mobile technology. These reference term lists are intended to serve as a ”gold standard” for the qualitative and quantitative evaluation of automatic term e...

2004
Stefan Breuer Julia Abresch

A multi-phone unit specification for unit selection speech synthesis is introduced and tested with regard to its qualitative aspects by means of a listening experiment. This different concept of unit definition aims to prevent spectral discontinuities at highly critical points of concatenation and to allow for a faster creation of speech corpora, as well as a speed-up of cost calculation and un...

2014
Antoni Oliver

This paper presents a set of methodologies and algorithms to create WordNets following the expand model. We explore dictionary and BabelNet based strategies, as well as methodologies based on the use of parallel corpora. Evaluation results for six languages are presented: Catalan, Spanish, French, German, Italian and Portuguese. Along with the methodologies and evaluation we present an implemen...

2017
Karan Singla Evgeny A. Stepanov Ali Orkan Bayer Giuseppe Carenini Giuseppe Riccardi

Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation usi...

2016
Martin Brümmer Milan Dojchinovski Sebastian Hellmann

The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated W...

2004
Panagiotis Tzevelekos Georgios Kouroupetroglou

In the present paper, we present the development of a framework of methodologies, which allow the creation of acoustic analysis, by woodwind musical instrument recordings corpora, as well as the implementation of virtual instruments, by physical modeling. We emphasize on traditional instruments, starting with the zournas. By analysis, acoustical aspects of the instrument are derived (attack-rel...

Journal: :Information 2021

Up until today research in various educational and linguistic domains such as learner corpus research, writing or second language acquisition has produced a substantial amount of data the form L1 L2 corpora. However, multitude individual solutions combined with domain-inherent obstacles sharing have so far hampered comparability, reusability reproducibility results. In this article, we present ...

2008
Claire Cardie Cynthia Farina Matt Rawding Adil Aijaz

We describe the creation of a corpus that supports a real-world hierarchical text categorization task in the domain of electronic rulemaking (eRulemaking). Features of the task and of the eRulemaking domain engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning task. Interannotator agreement results are presented for a group of six annotators...

2014
Veronika Vincze Katalin Ilona Simkó Viktor Varga

Uncertainty detection has been a popular topic in natural language processing, which manifested in the creation of several corpora for English. Here we show how the annotation guidelines originally developed for English standard texts can be adapted to Hungarian webtext. We annotated a small corpus of Facebook posts for uncertainty phenomena and we illustrate the main characteristics of such te...

2008
Cleber Gouvêa Stanley Loh Luís Fernando Fortes Garcia Evandro Brasil da Fonseca Igor Wendt

This paper presents an approach that identifies Location Indicators related to geographical locations, by analyzing texts of news published in the Web. The goal is to semi-automatically create Gazetteers with the identified relations and then perform geo-referencing of news. Location Indicators include non-geographical entities that are dynamic and may change along the time. The use of news pub...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید