neural document embedding

Learning Document Image Features With SqueezeNet Convolutional Neural Network

Journal: International Journal of Engineering 2020

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

FaDA: Fast Document Aligner using Word Embedding

2016

Pintu Lohar Debasis Ganguly Haithem Afli Andy Way Gareth J.F. Jones

FaDA1 is a free/open-source tool for aligning multilingual documents. It employs a novel crosslingual information retrieval (CLIR)-based document-alignment algorithm involving the distances between embedded word vectors in combination with the word overlap between the source-language and the target-language documents. In this approach, we initially construct a pseudo-query from a source-languag...

متن کامل

Class Vectors: Embedding representation of Document Classes

Journal: :CoRR 2015

Devendra Singh Sachan Shailesh Kumar

Distributed representations of words and paragraphs as semantic embeddings in high dimensional data are used across a number of Natural Language Understanding tasks such as retrieval, translation, and classification. In this work, we propose ”Class Vectors” a framework for learning a vector per class in the same embedding space as the word and paragraph embeddings. Similarity between these clas...

متن کامل

Using the Output Embedding to Improve Language Models

2017

Ofir Press Lior Wolf

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied mo...

متن کامل

Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents

2017

Irina Illina Dominique Fohr

Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed...

متن کامل

Hybrid Document Indexing with Spectral Embedding

2007

Irina Matveeva Gina-Anne Levow

Document representation has a large impact on the performance of document retrieval and clustering algorithms. We propose a hybrid document indexing scheme that combines the traditional bagof-words representation with spectral embedding. This method accounts for the specifics of the document collection and also uses semantic similarity information based on a large scale statistical analysis. Cl...

متن کامل

Neural Social Recommendation With User Embedding

Journal: :IEEE Access 2020

متن کامل

Generative Topic Embedding: a Continuous Representation of Documents

2016

Shaohua Li Tat-Seng Chua Jun Zhu Chunyan Miao

Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative to...

متن کامل

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Journal: :CoRR 2016

Shaohua Li Tat-Seng Chua Jun Zhu Chunyan Miao

Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative to...

متن کامل

Semantic Embedding for Information Retrieval

2017

Shenghui Wang Rob Koopman

Capturing semantics in a computable way is desirable for many applications, such as information retrieval, document clustering or classification, etc. Embedding words or documents in a vector space is a common first-step. Different types of embedding techniques have their own characteristics which makes it difficult to choose one for an application. In this paper, we compared a few off-the-shel...

متن کامل