Tensor2Tensor for Neural Machine Translation

نویسندگان

  • Ashish Vaswani
  • Samy Bengio
  • Eugene Brevdo
  • Francois Chollet
  • Aidan N. Gomez
  • Stephan Gouws
  • Llion Jones
  • Lukasz Kaiser
  • Nal Kalchbrenner
  • Niki Parmar
  • Ryan Sepassi
  • Noam Shazeer
  • Jakob Uszkoreit
چکیده

Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model. 1 Neural Machine Translation Background Machine translation using deep neural networks achieved great success with sequence-tosequence models (Sutskever et al., 2014; Bahdanau et al., 2014; Cho et al., 2014) that used recurrent neural networks (RNNs) with LSTM cells (Hochreiter and Schmidhuber, 1997). The basic sequence-to-sequence architecture is composed of an RNN encoder which reads the source sentence one token at a time and transforms it into a fixed-sized state vector. This is followed by an RNN decoder, which generates the target sentence, one token at a time, from the state vector. While a pure sequence-to-sequence recurrent neural network can already obtain good translation results (Sutskever et al., 2014; Cho et al., 2014), it suffers from the fact that the whole input sentence needs to be encoded into a single fixed-size vector. This clearly manifests itself in the degradation of translation quality on longer sentences and was partially overcome in Bahdanau et al. (2014) by using a neural model of attention. Convolutional architectures have been used to obtain good results in word-level neural machine translation starting from Kalchbrenner and Blunsom (2013) and later in Meng et al. (2015). These early models used a standard RNN on top of the convolution to generate the output, which creates a bottleneck and hurts performance. Fully convolutional neural machine translation without this bottleneck was first achieved in Kaiser and Bengio (2016) and Kalchbrenner et al. (2016). The Extended Neural GPU model (Kaiser and Bengio, 2016) used a recurrent stack of gated convolutional layers, while the ByteNet model (Kalchbrenner et al., 2016) did away with recursion and used left-padded convolutions in the decoder. This idea, introduced in WaveNet (van den Oord et al., 2016), significantly improves efficiency of the model. The same technique was improved in a number of neural translation models recently, including Gehring et al. (2017) and Kaiser et al. (2017). 2 Self-Attention Instead of convolutions, one can use stacked self-attention layers. This was introduced in the Transformer model (Vaswani et al., 2017) and has significantly improved state-of-the-art in machine translation and language modeling while also improving the speed of training. Research ar X iv :1 80 3. 07 41 6v 1 [ cs .L G ] 1 6 M ar 2 01 8

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training Tips for the Transformer Model

This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequencemodel (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers...

متن کامل

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

Neural Machine Translation

Draft of textbook chapter on neural machine translation. a comprehensive treatment of the topic, ranging from introduction to neural networks, computation graphs, description of the currently dominant attentional sequence-to-sequence model, recent refinements, alternative architectures and challenges. Written as chapter for the textbook Statistical Machine Translation. Used in the JHU Fall 2017...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Byte-based Neural Machine Translation

This paper presents experiments comparing character-based and byte-based neural machine translation systems. The main motivation of the byte-based neural machine translation system is to build multilingual neural machine translation systems that can share the same vocabulary. We compare the performance of both systems in several language pairs and we see that the performance in test is similar ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018