A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
نویسندگان
چکیده
منابع مشابه
Data Augmentation for Low-Resource Neural Machine Translation
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, syn...
متن کاملNeural machine translation for low-resource languages
Neural machine translation (NMT) approaches have improved the state of the art in many machine translation settings over the last couple of years, but they require large amounts of training data to produce sensible output. We demonstrate that NMT can be used for low-resource languages as well, by introducing more local dependencies and using word alignments to learn sentence reordering during t...
متن کاملCopied Monolingual Data Improves Low-Resource Neural Machine Translation
We train a neural machine translation (NMT) system to both translate sourcelanguage text and copy target-language text, thereby exploiting monolingual corpora in the target language. Specifically, we create a bitext from the monolingual text in the target language so that each source sentence is identical to the target sentence. This copied data is then mixed with the parallel corpus and the NM...
متن کاملMultilingual Neural Machine Translation for Low Resource Languages
Neural Machine Translation (NMT) has been shown to be more effective in translation tasks compared to the Phrase-Based Statistical Machine Translation (PBMT). However, NMT systems are limited in translating low-resource languages (LRL), due to the fact that neural methods require a large amount of parallel data to learn effective mappings between languages. In this work we show how so-called mu...
متن کاملUniversal Neural Machine Translation for Extremely Low Resource Languages
In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transferlearning approach to share lexical and sentences level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information
سال: 2020
ISSN: 2078-2489
DOI: 10.3390/info11050255