نتایج جستجو برای: neural document embedding

تعداد نتایج: 520398  

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

2016
Pintu Lohar Debasis Ganguly Haithem Afli Andy Way Gareth J.F. Jones

FaDA1 is a free/open-source tool for aligning multilingual documents. It employs a novel crosslingual information retrieval (CLIR)-based document-alignment algorithm involving the distances between embedded word vectors in combination with the word overlap between the source-language and the target-language documents. In this approach, we initially construct a pseudo-query from a source-languag...

Journal: :CoRR 2015
Devendra Singh Sachan Shailesh Kumar

Distributed representations of words and paragraphs as semantic embeddings in high dimensional data are used across a number of Natural Language Understanding tasks such as retrieval, translation, and classification. In this work, we propose ”Class Vectors” a framework for learning a vector per class in the same embedding space as the word and paragraph embeddings. Similarity between these clas...

2017
Ofir Press Lior Wolf

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied mo...

2017
Irina Illina Dominique Fohr

Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed...

2007
Irina Matveeva Gina-Anne Levow

Document representation has a large impact on the performance of document retrieval and clustering algorithms. We propose a hybrid document indexing scheme that combines the traditional bagof-words representation with spectral embedding. This method accounts for the specifics of the document collection and also uses semantic similarity information based on a large scale statistical analysis. Cl...

2016
Shaohua Li Tat-Seng Chua Jun Zhu Chunyan Miao

Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative to...

Journal: :CoRR 2016
Shaohua Li Tat-Seng Chua Jun Zhu Chunyan Miao

Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative to...

2017
Shenghui Wang Rob Koopman

Capturing semantics in a computable way is desirable for many applications, such as information retrieval, document clustering or classification, etc. Embedding words or documents in a vector space is a common first-step. Different types of embedding techniques have their own characteristics which makes it difficult to choose one for an application. In this paper, we compared a few off-the-shel...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید