Retranslating number expression unknown word in Chinese-Vietnamese statistical machine translation

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach for Handling Unknown Word Problem in Chinese-Vietnamese Machine Translation

For languages where space cannot be a boundary of a word, such as Chinese and Vietnamese, word segmentation is always the task to be done first in a statistical machine translation system (SMT). The word segmentation increases the translation quality, but it causes many unknown words (UKW) in the target translation. In this paper, we will present a novel approach to translate UKW. Based on the ...

متن کامل

Integrated Chinese word segmentation in statistical machine translation

A Chinese sentence is represented as a sequence of characters, and words are not separated from each other. In statistical machine translation, the conventional approach is to segment the Chinese character sequence into words during the pre-processing. The training and translation are performed afterwards. However, this method is not optimal for two reasons: 1. The segmentations may be erroneou...

متن کامل

Improved Statistical Machine Translation by Multiple Chinese Word Segmentation

Chinese word segmentation (CWS) is a necessary step in Chinese-English statistical machine translation (SMT) and its performance has an impact on the results of SMT. However, there are many settings involved in creating a CWS system such as various specifications and CWS methods. This paper investigates the effect of these settings to SMT. We tested dictionarybased and CRF-based approaches and ...

متن کامل

Do We Need Chinese Word Segmentation for Statistical Machine Translation?

In Chinese texts, words are not separated by white spaces. This is problematic for many natural language processing tasks. The standard approach is to segment the Chinese character sequence into words. Here, we investigate Chinese word segmentation for statistical machine translation. We pursue two goals: the first one is the maximization of the final translation quality; the second is the mini...

متن کامل

Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation

Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Computer Science and Cybernetics

سال: 2014

ISSN: 1813-9663,1813-9663

DOI: 10.15625/1813-9663/30/2/2589