Bilingual Subword Segmentation for Neural Machine Translation

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morpheme-Aware Subword Segmentation for Neural Machine Translation

Neural machine translation together with subword segmentation has recently produced state-of-the-art translation performance. The commonly used segmentation algorithm based on byte-pair encoding (BPE) does not consider the morphological structure of words. This occasionally causes misleading segmentation and incorrect translation of rare words. In this thesis we explore the use of morphological...

متن کامل

Bridging Neural Machine Translation and Bilingual Dictionaries

Neural Machine Translation (NMT) has become the new state-of-the-art in several language pairs. However, it remains a challenging problem how to integrate NMT with a bilingual dictionary which mainly contains words rarely or never seen in the bilingual training data. In this paper, we propose two methods to bridge NMT and the bilingual dictionaries. The core idea behind is to design novel model...

متن کامل

Neural Machine Translation of Rare Words with Subword Units

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as seq...

متن کامل

Bilingual Segmentation for Alignment and Translation

We propose a method that bilingually segments sentences in languages with no clear delimiter for word boundaries. In our model, we first convert the search for the segmentation into a sequential tagging problem, allowing for a polynomial-time dynamic-programming solution, and incorporate a control to balance monolingual and bilingual information at hand. Our bilingual segmentation algorithm, th...

متن کامل

Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text

A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences. 1 I ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Natural Language Processing

سال: 2021

ISSN: 1340-7619,2185-8314

DOI: 10.5715/jnlp.28.632