UPC’s Bilingual N-gram Translation System
نویسندگان
چکیده
This paper describes the UPC’s bilingual n-gram approach to statistical machine translation, which implements the log-linear combination of a bilingual n-gram translation model with six additional feature functions. A brief description of the complete system is presented and special attention is devoted to the novel features and reordering strategies that have been recently implemented. Translation results for the Spanish-to-English and English-to-Spanish tasks considered during the TC-STAR’s second evaluation campaign are presented and discussed. Finally, improvements achieved in translation accuracy with respect to the previous year’s system are also evaluated and discussed
منابع مشابه
Wider Context by Using Bilingual Language Models in Machine Translation
In past Evaluations for Machine Translation of European Languages, it could be shown that the translation performance of SMT systems can be increased by integrating a bilingual language model into a phrase-based SMT system. In the bilingual language model, target words with their aligned source words build the tokens of an n-gram based language model. We analyzed the effect of bilingual languag...
متن کاملStatistical Machine Translation of Euparl Data by using Bilingual N-grams
This work discusses translation results for the four Euparl data sets which were made available for the shared task “Exploiting Parallel Texts for Statistical Machine Translation”. All results presented were generated by using a statistical machine translation system which implements a log-linear combination of feature functions along with a bilingual n-gram translation model.
متن کاملNcode: an Open Source Bilingual N-gram SMT Toolkit
This paper describes N, an open source statistical machine translation (SMT) toolkit for translation models estimated as n-gram language models of bilingual units (tuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review...
متن کاملSmooth Bilingual N-Gram Translation
We address the problem of smoothing translation probabilities in a bilingual N-grambased statistical machine translation system. It is proposed to project the bilingual tuples onto a continuous space and to estimate the translation probabilities in this representation. A neural network is used to perform the projection and the probability estimation. Smoothing probabilities is most important fo...
متن کاملA Comparative Study on Translation Units for Bilingual Lexicon Extraction
This paper presents on-going research on automatic extraction of bilingual lexicon from English-Japanese parallel corpora. The main objective of this paper is to examine various Ngram models of generating translation units for bilingual lexicon extraction. Three N-gram models, a baseline model (Bound-length N-gram) and two new models (Chunk-bound Ngram and Dependency-linked N-gram) are compared...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006