نتایج جستجو برای: corpora creation
تعداد نتایج: 147847 فیلتر نتایج به سال:
In this article we illustrate and evaluate an approach to create high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The trans...
Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system training and development. This paper describes recent efforts at Linguistic Data Consortium to create linguistic resources for MT, including corpora, specifications and resource infrastructure. We review LDC's three-pronged a...
In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. Th...
There are currently two streams that dominate the research on knowledge federation: The one is the trend towards Linked Data, leading to fine-grained structuring of information that is machine readable; The other is the reuse and co-creation of information that spreads the burden of its creation to the public and enables the availability of large knowledge corpora. In this contribution we outli...
The aim of the paper is to present a methodological framework for development an English-Lithuanian bilingual termbase in cybersecurity domain, which can be applied as model other language pairs and specialised domains. It argued that presented approach ensure creation high-quality termbases even with limited available resources. touches upon methods problems dataset (corpora) compilation, term...
This paper presents a guideline for the construction of formal grammar Malagasy language. The used method is based on deterministic approach given that reliable corpora are not yet available Malagasy. main purpose language recognition which will be keystone an automatic checker. Jointly with existing part-of-speech-tagger, checker bring us further by facilitating creation in turn boost processi...
Corpora of different languages but similar genre allow language comparison. Applying the same methods to corpora of the same language but of different genre or origin results in corpus comparison. Having many corpora in identical formats, these statistical methods will generate various data for manual or automatic analysis. The introduced system reports more than 150 results per corpus, for app...
It is well understood that the speech databases play a very important role for speech recognition. It is a dream for speech recognition researchers to create more useful databases with smaller efforts. To achieve this goal, the database should be well designed at first, and tools and more information should be provided so that the databases can be made full use of. This paper will illustrate th...
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
Identifying translations from comparable corpora is a wellknown problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comp...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید