نتایج جستجو برای: corpora creation
تعداد نتایج: 147847 فیلتر نتایج به سال:
In this paper, we discuss practical and methodological issues of the creation of reference term lists (RTLs) for the evaluation of monolingual and bilingual term candidate extraction from comparable corpora in the domains of wind energy and mobile technology. These reference term lists are intended to serve as a ”gold standard” for the qualitative and quantitative evaluation of automatic term e...
A multi-phone unit specification for unit selection speech synthesis is introduced and tested with regard to its qualitative aspects by means of a listening experiment. This different concept of unit definition aims to prevent spectral discontinuities at highly critical points of concatenation and to allow for a faster creation of speech corpora, as well as a speed-up of cost calculation and un...
This paper presents a set of methodologies and algorithms to create WordNets following the expand model. We explore dictionary and BabelNet based strategies, as well as methodologies based on the use of parallel corpora. Evaluation results for six languages are presented: Catalan, Spanish, French, German, Italian and Portuguese. Along with the methodologies and evaluation we present an implemen...
Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation usi...
The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated W...
In the present paper, we present the development of a framework of methodologies, which allow the creation of acoustic analysis, by woodwind musical instrument recordings corpora, as well as the implementation of virtual instruments, by physical modeling. We emphasize on traditional instruments, starting with the zournas. By analysis, acoustical aspects of the instrument are derived (attack-rel...
Up until today research in various educational and linguistic domains such as learner corpus research, writing or second language acquisition has produced a substantial amount of data the form L1 L2 corpora. However, multitude individual solutions combined with domain-inherent obstacles sharing have so far hampered comparability, reusability reproducibility results. In this article, we present ...
We describe the creation of a corpus that supports a real-world hierarchical text categorization task in the domain of electronic rulemaking (eRulemaking). Features of the task and of the eRulemaking domain engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning task. Interannotator agreement results are presented for a group of six annotators...
Uncertainty detection has been a popular topic in natural language processing, which manifested in the creation of several corpora for English. Here we show how the annotation guidelines originally developed for English standard texts can be adapted to Hungarian webtext. We annotated a small corpus of Facebook posts for uncertainty phenomena and we illustrate the main characteristics of such te...
This paper presents an approach that identifies Location Indicators related to geographical locations, by analyzing texts of news published in the Web. The goal is to semi-automatically create Gazetteers with the identified relations and then perform geo-referencing of news. Location Indicators include non-geographical entities that are dynamic and may change along the time. The use of news pub...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید