نتایج جستجو برای: corpora creation
تعداد نتایج: 147847 فیلتر نتایج به سال:
persian natural language processing (nlp) researchers have many limitations to access linguistic tools which are suitable for text processing. therefore, researchin persian text processing is very limited. since dataset is an important requirement for experiments and their evaluation, we aimed to create appropriate corpora for information retrieval and natural language processing in persian. th...
We discuss the collection and analysis of a cross-sectional and longitudinal learner corpus consisting of answers to reading comprehension questions written by adult second language learners of German. We motivate the need for such task-based learner corpora and identify the properties which make reading comprehension exercises a particularly interesting task. In terms of the creation of the co...
Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis to statistical modeling of speech phenomena like prosody or expressivity. In all cases, these usages require a wide range of tools for corpus creation, labeling, symbolic and acoustic ...
This paper describes a small, structured English corpus that is designed for translation into Less Commonly Taught Languages (LCTLs), and a set of re-usable tools for creation of similar corpora. 1 The corpus systematically explores meanings that are known to affect morphology or syntax in the world’s languages. Each sentence is associated with a feature structure showing the elements of meanin...
The performance of a statistical machine translation system depends on the size of the available task-specific bilingual training corpus. On the other hand, acquisition of a large high-quality bilingual parallel text for the desired domain and language pair requires a lot of time and effort, and, for some language pairs, is not even possible. Besides, small corpora have certain advantages like ...
Genres of spoken and written texts are being intensively studied from various angles, e.g., communication studies, discourse analysis, computational linguistics, without arriving at a generally accepted definition. Many corpora have been built to represent the language, but very few large corpora indicate genres, and when they do the typology of genres varies widely. For instance, the Brown cor...
This paper presents a linguistic-based approach to term extraction from corpora in the biomedical domain. The method is based on an analysis of terms and their context that verify linguistic constraints. It focuses on participles and prepositional complements. The purpose of our approach is to obtain terms that are relevant for knowledge acquisition applications, such as the creation and enrich...
This paper presents specifications and requirements for creation and validation o f large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems . The prepared language resources are created and validated within the scope o f the EU-project LC-STAR (Lexica and Corpora for Speech-toSpeech Translation Component...
Access to web-scale corpora is gradually bringing robust automatic knowledge base creation and extension within reach. To exploit these large unannotated—and extremely difficult to annotate—corpora, unsupervised machine learning methods are required. Probabilistic models of text have recently found some success as such a tool, but scalability remains an obstacle in their application, with stand...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید