Rule-based Syntactic Preprocessing for Syntax-based Machine Translation

نویسندگان

Yuto Hatakoshi

Graham Neubig

Sakriani Sakti

Tomoki Toda

Satoshi Nakamura

چکیده

Several preprocessing techniques using syntactic information and linguistically motivated rules have been proposed to improve the quality of phrase-based machine translation (PBMT) output. On the other hand, there has been little work on similar techniques in the context of other translation formalisms such as syntax-based SMT. In this paper, we examine whether the sort of rule-based syntactic preprocessing approaches that have proved beneficial for PBMT can contribute to syntax-based SMT. Specifically, we tailor a highly successful preprocessing method for EnglishJapanese PBMT to syntax-based SMT, and find that while the gains achievable are smaller than those for PBMT, significant improvements in accuracy can be realized.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EUSMT: Incorporating Linguistic information to Statistical Machine Translation for a morphologically rich language. Its use in preliminary SMT-RBMT-EBMT hybridization

We have proposed and successfully tested new techniques to deal with the problems found in applying Statistical Machine Translation (SMT) to language pairs with great morphological and syntactical differences. These techniques are based on segmentation and reordering and we have evaluated them in the context of Spanish-Basque translation. Dealing with morphology, we first proved that the qualit...

متن کامل

Rule Selection with Soft Syntactic Features for String-to-Tree Statistical Machine Translation

In syntax-based machine translation, rule selection is the task of choosing the correct target side of a translation rule among rules with the same source side. We define a discriminative rule selection model for systems that have syntactic annotation on the target language side (stringto-tree). This is a new and clean way to integrate soft source syntactic constraints into string-to-tree syste...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Syntax-Based Word Reordering in Phrase-Based Statistical Machine Translation: Why Does it Work?

Most natural language applications have some degree of preprocessing of data: tokenisation, stemming and so on. In the domain of Statistical Machine Translation (SMT) it has been shown that word reordering as a preprocessing step can help the translation process, but it is unclear why. We propose two possible reasons for the observed improvement: (1) that the reordering explicitly matches the s...

متن کامل

Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

We propose a generic rule induction framework that is informed by syntax from both sides of a parsed parallel corpus, as sets of structural, boundary and labeling related constraints. Factoring syntax in this manner empowers our framework to work with independent annotations coming from multiple resources and not necessarily a single syntactic structure. We then explore the issue of lexical cov...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Rule-based Syntactic Preprocessing for Syntax-based Machine Translation

نویسندگان

چکیده

منابع مشابه

EUSMT: Incorporating Linguistic information to Statistical Machine Translation for a morphologically rich language. Its use in preliminary SMT-RBMT-EBMT hybridization

Rule Selection with Soft Syntactic Features for String-to-Tree Statistical Machine Translation

A Hybrid Machine Translation System Based on a Monotone Decoder

Syntax-Based Word Reordering in Phrase-Based Statistical Machine Translation: Why Does it Work?

Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

عنوان ژورنال:

اشتراک گذاری