Robust Data Augmentation for Neural Machine Translation through EVALNET

نویسندگان

چکیده

Since building Neural Machine Translation (NMT) systems requires a large parallel corpus, various data augmentation techniques have been adopted, especially for low-resource languages. In order to achieve the best performance through augmentation, NMT should be able evaluate quality of augmented data. Several studies addressed weighting assess quality. The basic idea adopted in previous is loss value that system calculates when learning from training weight derived data, simple heuristic rules or neural models, can adjust used next step process. this study, we propose EvalNet, evaluation network, NMT. EvalNet exploits value, cross-attention map, and semantic similarity between as its features. map an encoded representation layers Transformer, which base architecture system. cosine distance two embeddings source sentence target sentence. Owing parallelism combination proved effective features evaluation, besides value. first evaluator network introduces Through experiments, conclude yet beneficial robust outperforms evaluator.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Augmentation for Low-Resource Neural Machine Translation

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, syn...

متن کامل

Dynamic Data Selection for Neural Machine Translation

Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of neural machine translation (NMT), we explore in this paper to what extent and how NMT can also benefit from data selection. While state-of-the-art data selection ...

متن کامل

Improving Machine Translation through Linked Data

With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translat...

متن کامل

Pre-Translation for Neural Machine Translation

Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine translation (SMT)-based systems, in some cases, the NMT system produces translations that have a completely different meaning. This is especially the case when...

متن کامل

Neural Name Translation Improves Neural Machine Translation

In order to control computational complexity, neural machine translation (NMT) systems convert all rare words outside the vocabulary into a single unk symbol. Previous solution (Luong et al., 2015) resorts to use multiple numbered unks to learn the correspondence between source and target rare words. However, testing words unseen in the training corpus cannot be handled by this method. And it a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics

سال: 2022

ISSN: ['2227-7390']

DOI: https://doi.org/10.3390/math11010123