MWE-sensitive Word Alignment in Factored Translation Model

نویسندگان

  • Tsuyoshi Okita
  • Andy Way
چکیده

The factored translation model in Moses (Koehn et al. 2007; Avramidis and Koehn 2008; Koehn 2010), which consists of translation processes followed by a generation process, intends to handle morphologically rich languages by integrating additional linguistic markup at the word level, where each type of additional word-level information is called a factor with the independent assumptions shown in (1):

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Given Bilingual Terminology in Statistical Machine Translation: MWE-Sensitve Word Alignment and Hierarchical Pitman-Yor Process-Based Translation Model Smoothing

This paper considers a scenario when we are given almost perfect knowledge about bilingual terminology in terms of a test corpus in Statistical Machine Translation (SMT). When the given terminology is part of a training corpus, one natural strategy in SMT is to use the trained translation model ignoring the given terminology. Then, two questions arises here. 1) Can a word aligner capture the gi...

متن کامل

MWE Alignment in Phrase Based Statistical Machine Translation

Multiword Expression (MWE) contributes to major lexical ambiguity problems for any language and poses a big challenge in statistical machine translation. This paper presents the role of MWEs in improving the performance of phrase based Statistical machine Translation (PB-SMT) system. We preprocess the parallel corpus by single tokenizing the MWEs on both sides which leads to significant improve...

متن کامل

Statistical Approach With Factored Translation Models For Indian Languages

Factored translation models are an extension to phrase based statistical translation models which integrate additional annotation at word level. Here we present a study of statistical models and approaches to translate Hindi to English. Experiments were also conducted on alignment models using various word groupings and using GIZA++ to predict their English translations and fertility. TAJ A new...

متن کامل

Discriminative Modeling of Extraction Sets for Machine Translation

We present a discriminative model that directly predicts which set of phrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate featur...

متن کامل

Exploiting Translational Correspondences for Pattern-Independent MWE Identification

Based on a study of verb translations in the Europarl corpus, we argue that a wide range of MWE patterns can be identified in translations that exhibit a correspondence between a single lexical item in the source language and a group of lexical items in the target language. We show that these correspondences can be reliably detected on dependency-parsed, word-aligned sentences. We propose an ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010