Rule-based Syntactic Preprocessing for Syntax-based Machine Translation
نویسندگان
چکیده
Several preprocessing techniques using syntactic information and linguistically motivated rules have been proposed to improve the quality of phrase-based machine translation (PBMT) output. On the other hand, there has been little work on similar techniques in the context of other translation formalisms such as syntax-based SMT. In this paper, we examine whether the sort of rule-based syntactic preprocessing approaches that have proved beneficial for PBMT can contribute to syntax-based SMT. Specifically, we tailor a highly successful preprocessing method for EnglishJapanese PBMT to syntax-based SMT, and find that while the gains achievable are smaller than those for PBMT, significant improvements in accuracy can be realized.
منابع مشابه
EUSMT: Incorporating Linguistic information to Statistical Machine Translation for a morphologically rich language. Its use in preliminary SMT-RBMT-EBMT hybridization
We have proposed and successfully tested new techniques to deal with the problems found in applying Statistical Machine Translation (SMT) to language pairs with great morphological and syntactical differences. These techniques are based on segmentation and reordering and we have evaluated them in the context of Spanish-Basque translation. Dealing with morphology, we first proved that the qualit...
متن کاملRule Selection with Soft Syntactic Features for String-to-Tree Statistical Machine Translation
In syntax-based machine translation, rule selection is the task of choosing the correct target side of a translation rule among rules with the same source side. We define a discriminative rule selection model for systems that have syntactic annotation on the target language side (stringto-tree). This is a new and clean way to integrate soft source syntactic constraints into string-to-tree syste...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملSyntax-Based Word Reordering in Phrase-Based Statistical Machine Translation: Why Does it Work?
Most natural language applications have some degree of preprocessing of data: tokenisation, stemming and so on. In the domain of Statistical Machine Translation (SMT) it has been shown that word reordering as a preprocessing step can help the translation process, but it is unclear why. We propose two possible reasons for the observed improvement: (1) that the reordering explicitly matches the s...
متن کاملExtraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages
We propose a generic rule induction framework that is informed by syntax from both sides of a parsed parallel corpus, as sets of structural, boundary and labeling related constraints. Factoring syntax in this manner empowers our framework to work with independent annotations coming from multiple resources and not necessarily a single syntactic structure. We then explore the issue of lexical cov...
متن کامل