نتایج جستجو برای: corpora creation

تعداد نتایج: 147847  

Journal: :Journal of Neuropathology and Experimental Neurology 1996

Journal: :international journal of information science and management 0
zahra abdolhosseini department of computer engineering, alzahra university, tehran, iran mohammad reza keyvanpour department of computer engineering, alzahra university, tehran, iran

persian natural language processing (nlp) researchers have many limitations to access linguistic tools which are suitable for text processing. therefore, researchin persian text processing is very limited. since dataset is an important requirement for experiments and their evaluation, we aimed to create appropriate corpora for information retrieval and natural language processing in persian. th...

2012
Niels Ott Ramon Ziai

We discuss the collection and analysis of a cross-sectional and longitudinal learner corpus consisting of answers to reading comprehension questions written by adult second language learners of German. We motivate the need for such task-based learner corpora and identify the properties which make reading comprehension exercises a particularly interesting task. In terms of the creation of the co...

Journal: :TAL 2008
Grégory Beller Christophe Veaux Gilles Degottex Nicolas Obin Pierre Lanchantin Xavier Rodet

Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis to statistical modeling of speech phenomena like prosody or expressivity. In all cases, these usages require a wide range of tools for corpus creation, labeling, symbolic and acoustic ...

2006
Alison Alvarez Lori S. Levin Robert E. Frederking Simon Fung Donna Gates Jeff Good

This paper describes a small, structured English corpus that is designed for translation into Less Commonly Taught Languages (LCTLs), and a set of re-usable tools for creation of similar corpora. 1 The corpus systematically explores meanings that are known to affect morphology or syntax in the world’s languages. Each sentence is associated with a feature structure showing the elements of meanin...

1974
Maja Popović Hermann Ney

The performance of a statistical machine translation system depends on the size of the available task-specific bilingual training corpus. On the other hand, acquisition of a large high-quality bilingual parallel text for the desired domain and language pair requires a lot of time and effort, and, for some language pairs, is not even possible. Besides, small corpora have certain advantages like ...

2007
Marina Santini Serge Sharoff David Lee

Genres of spoken and written texts are being intensively studied from various angles, e.g., communication studies, discourse analysis, computational linguistics, without arriving at a generally accepted definition. Many corpora have been built to represent the language, but very few large corpora indicate genres, and when they do the typology of genres varies widely. For instance, the Brown cor...

Journal: :Research in Computing Science 2013
Wiktoria Golik Robert Bossy Zorana Ratkovic Claire Nedellec

This paper presents a linguistic-based approach to term extraction from corpora in the biomedical domain. The method is based on an analysis of terms and their context that verify linguistic constraints. It focuses on participles and prepositional complements. The purpose of our approach is to obtain terms that are relevant for knowledge acquisition applications, such as the creation and enrich...

2004
Hanne Fersøe Elviira Hartikainen Henk van den Heuvel Giulio Maltese Asunción Moreno Shaunie Shammass Ute Ziegenhain

This paper presents specifications and requirements for creation and validation o f large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems . The prepared language resources are created and validated within the scope o f the EU-project LC-STAR (Lexica and Corpora for Speech-toSpeech Translation Component...

Journal: :CoRR 2015
Maxim Rabinovich Cédric Archambeau

Access to web-scale corpora is gradually bringing robust automatic knowledge base creation and extension within reach. To exploit these large unannotated—and extremely difficult to annotate—corpora, unsupervised machine learning methods are required. Probabilistic models of text have recently found some success as such a tool, but scalability remains an obstacle in their application, with stand...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید