Refining Kazakh Word Alignment Using Simulation Modeling Methods for Statistical Machine Translation

نویسنده

  • Amandyk Kartbayev
چکیده

Word alignment play an important role in the training of statistical machine translation systems. We present a technique to refine word alignments at phrase level after the collection of sentences from the Kazakh-English parallel corpora. The estimation technique extracts the phrase pairs from the word alignment and then incorporates them into the translation system for further steps. Although it is a pretty important step in training procedure, an word alignment process often has practical concerns with agglutinative languages. We consider an approach, which is a step towards an improved statistical translation model that incorporates morphological information and has better translation performance. Our goal is to present a statistical model of the morphology dependent procedure, which was evaluated over the Kazakh-English language pair and has obtained an improved BLEU score over state-of-the-art models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation

Languages that have no explicit word delimiters often have to be segmented for statistical machine translation (SMT). This is commonly performed by automated segmenters trained on manually annotated corpora. However, the word segmentation (WS) schemes of these annotated corpora are handcrafted for general usage, and may not be suitable for SMT. An analysis was performed to test this hypothesis ...

متن کامل

Statistical machine translation: from single word models to alignment templates

In this work, new approaches for machine translation using statistical methods are described. In addition to the standard source-channel approach to statistical machine translation, a more general approach based on the maximum entropy principle is presented. Various methods for computing single-word alignments using statistical or heuristic models are described. Various smoothing techniques, me...

متن کامل

The NICT Translation System for IWSLT

This paper describes NICT’s participation in the IWSLT 2014 evaluation campaign for the TED Chinese-English translation shared-task. Our approach used a combination of phrase-based and hierarchical statistical machine translation (SMT) systems. Our focus was in several areas, specifically system combination, word alignment, and various language modeling techniques including the use of neural ne...

متن کامل

A Gold Standard for English-Swedish Word Alignment

Word alignment gold standards are an important resource for developing and evaluating word alignment methods. In this paper we present a free English–Swedish word alignment gold standard consisting of texts from Europarl with manually verified word alignments. The gold standard contains two sets of word aligned sentences, a test set for the purpose of evaluation and a training set that can be u...

متن کامل

It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

We reveal a previously unnoticed connection between dependency parsing and statistical machine translation (SMT), by formulating the dependency parsing task as a problem of word alignment. Furthermore, we show that two well known models for these respective tasks (DMV and the IBM models) share common modeling assumptions. This motivates us to develop an alignment-based framework for unsupervise...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015