نتایج جستجو برای: mizan english persian parallel corpus

تعداد نتایج: 413519  

2011
Morteza Okhovvat Behrouz Minaei-Bidgoli

One of the important actions in the processing of languages is part-of-speech tagging. Against of this importance, although numerous models have been presented in different languages but there is few works have been done in Persian language. In this paper, a part-of-speech tagging system on Persian corpus by using hidden Markov model is proposed. Achieving to this goal, the main aspects of Pers...

2003
Tracy Lin Jason S. Chang

We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense amb...

2009
Tatsuya Ishisaka Kazuhide Yamamoto Masao Utiyama Eiichiro Sumita

To address the shortage of Japanese-English parallel corpora, we developed a parallel corpus by collecting open source software manuals from the Web. The constructed corpus contains approximately 500 thousand sentence pairs that were aligned automatically by an existing method. We also conducted statistical machine translation (SMT) experiments with the corpus and confirmed that the corpus is u...

2000
Marko Tadic

The contribution gives a survey of procedures and formats used in building the Croatian-English parallel corpus which is being collected in the Institute of Linguistics at the Philosophical Faculty, University of Zagreb. The primary text source is newspaper Croatia Weekly which has been published from the beginning of 1998 by HIKZ (Croatian Institute for Information and Culture). After quick su...

Journal: :CoRR 2017
Anoop Kunchukuttan Pratik Mehta Pushpak Bhattacharyya

The IIT Bombay English-Hindi corpus contains parallel corpus for English-Hindi compiled from a variety of existing sources as well as corpora developed at the Center for Indian Language Technology1, IIT Bombay over the years. The training corpus consists of sentences, phrases as well as dictionary entries, spanning many applications and domains. The details of the training corpus are shown in T...

2012
Thoudam Doren Singh

The Statistical Machine Translation (SMT) systems are developed using sentence aligned parallel corpus. The difficulty is that there is no parallel corpus at the required measure for many language pairs. The preparation of large scale parallel corpus takes time and demands the linguistics skill. In the present work, the various issues of a quality parallel corpus and a technique that extracts p...

2012
Septina Dian Larasati

This paper describes the creation process of an Indonesian-English parallel corpus (IDENTIC). The corpus contains 45,000 sentences collected from different sources in different genres. Several manual text preprocessing tasks, such as alignment and spelling correction, are applied to the corpus to assure its quality. We also apply language specific text processing such as tokenization on both si...

2012
Petra Galuscáková Ondrej Bojar

The amount of training data in statistical machine translation critically affects translation quality. In this paper, we demonstrate how to increase translation quality for one language pair by introducing parallel data from a closely related language. Specifically, we improve English→Slovak translation using a large Czech– English parallel corpus and a shallow MT system for Czech→Slovak transl...

2009
Guy De Pauw Peter Waiganjo Wagacha Gilles-Maurice de Schryver

Research in data-driven methods for Machine Translation has greatly benefited from the increasing availability of parallel corpora. Processing the same text in two different languages yields useful information on how words and phrases are translated from a source language into a target language. To investigate this, a parallel corpus is typically aligned by linking linguistic tokens in the sour...

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه پیام نور - دانشگاه پیام نور مرکز - دانشکده زبانهای خارجی 1391

the primary goal of the current project was to examine the effect of three different treatments, namely, models with explicit instruction, models with implicit instruction, and models alone on differences between the three groups of subjects in the use of the elements of argument structures in terms of toulmins (2003) model (i.e., claim, data, counterargument claim, counterargument data, rebutt...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید