Training Phrase-Based Machine Translation Models on the Cloud: Open Source Machine Translation Toolkit Chaski
نویسندگان
چکیده
منابع مشابه
Training Phrase-Based Machine Translation Models on the CloudOpen Source Machine Translation Toolkit Chaski
In this paper we present an opensource machine translation toolkit Chaski which is capable of training phrase-based machine translation models on Hadoop clusters. The toolkit provides a full training pipeline including distributed word alignment, word clustering and phrase extraction. The toolkit also provides an extended error-tolerance mechanism over standardHadoop error-tolerance framework. ...
متن کاملNiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers...
متن کاملJoshua: An Open Source Toolkit for Parsing-Based Machine Translation
We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, ngram language model integration, beamand cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed c...
متن کاملSampleRank Training for Phrase-Based Machine Translation
Statistical machine translation systems are normally optimised for a chosen gain function (metric) by using MERT to find the best model weights. This algorithm suffers from stability problems and cannot scale beyond 20-30 features. We present an alternative algorithm for discriminative training of phrasebasedMT systems, SampleRank, which scales to hundreds of features, equals or beats MERT on b...
متن کاملContinuous Space Translation Models for Phrase-Based Statistical Machine Translation
This paper presents a new approach to perform the estimation of the translation model probabilities of a phrase-based statistical machine translation system. We use neural networks to directly learn the translation probability of phrase pairs using continuous representations. The system can be easily trained on the same data used to build standard phrase-based systems. We provide experimental e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Prague Bulletin of Mathematical Linguistics
سال: 2010
ISSN: 1804-0462,0032-6585
DOI: 10.2478/v10108-010-0004-8