SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

نویسندگان

  • Kyuhong Shim
  • Minjae Lee
  • Iksoo Choi
  • Yoonho Boo
  • Wonyong Sung
چکیده

We propose a fast approximation method of a softmax function with a very large vocabulary using singular value decomposition (SVD). SVD-softmax targets fast and accurate probability estimation of the topmost probable words during inference of neural network language models. The proposed method transforms the weight matrix used in the calculation of the output vector by using SVD. The approximate probability of each word can be estimated with only a small part of the weight matrix by using a few large singular values and the corresponding elements for most of the words. We applied the technique to language modeling and neural machine translation and present a guideline for good approximation. The algorithm requires only approximately 20% of arithmetic operations for an 800K vocabulary case and shows more than a three-fold speedup on a GPU.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computational complexity. Our approach further reduces the computation...

متن کامل

Self-organized Hierarchical Softmax

We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and s...

متن کامل

Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models

Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to neural language modeling and neural machine translation. These neural models can leverage knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference. As an alternative to gradient approximation an...

متن کامل

Exploring convolutional neural network structures and optimization techniques for speech recognition

Recently, convolutional neural networks (CNNs) have been shown to outperform the standard fully connected deep neural networks within the hybrid deep neural network / hidden Markov model (DNN/HMM) framework on the phone recognition task. In this paper, we extend the earlier basic form of the CNN and explore it in multiple ways. We first investigate several CNN architectures, including full and ...

متن کامل

Strategies for Training Large Vocabulary Neural Language Models

Training neural network language models over large vocabularies is computationally costly compared to count-based models such as Kneser-Ney. We present a systematic comparison of neural strategies to represent and train large vocabularies, including softmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization. We extend self normalization to be a proper es...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017