نتایج جستجو برای: topic modeling

تعداد نتایج: 543140  

Journal: :CoRR 2013
Neil Houlsby Massimiliano Ciaramita

We present an LDA approach to entity disambiguation. Each topic is associated with a Wikipedia article and topics generate either content words or entity mentions. Training such models is challenging because of the topic and vocabulary size, both in the millions. We tackle these problems using a novel distributed inference and representation framework based on a parallel Gibbs sampler guided by...

Journal: :CoRR 2014
Do-kyum Kim

We study the problem of topic modeling in corpora whose documents are organized in a multi-level hierarchy. We explore a parametric approach to this problem, assuming that the number of topics is known or can be estimated by cross-validation. The models we consider can be viewed as special (finite-dimensional) instances of hierarchical Dirichlet processes (HDPs). For these models we show that t...

2011
Tze-I Yang Andrew Torget Rada Mihalcea

In this paper, we explore the task of automatic text processing applied to collections of historical newspapers, with the aim of assisting historical research. In particular, in this first stage of our project, we experiment with the use of topical models as a means to identify potential issues of interest for historians. 1 Newspapers in Historical Research Surviving newspapers are among the ri...

2009
Konstantin Salomatin Yiming Yang Abhimanyu Lad

Popular methods for probabilistic topic modeling like the Latent Dirichlet Allocation (LDA, [1]) and Correlated Topic Models (CTM, [2]) share an important property, i.e., using a common set of topics to model all the data. This property can be too restrictive for modeling complex data entries where multiple fields of heterogeneous data jointly provide rich information about each object or event...

2011
Kumar Shubhankar Aditya Pratap Singh Vikram Pudi

In this paper we introduce a novel and efficient approach to detect and rank topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection and topic ranking has become a challenging task. We present a unique approach that uses closed frequent keywordset to form topics. We devise a modified time independent PageRank algorithm that a...

2017
Mehdi Allahyari Seyedamin Pouriyeh Krys Kochut Hamid Reza Arabnia

Probabilistic topic models, which aim to discover latent topics in text corpora define each document as a multinomial distributions over topics and each topic as a multinomial distributions over words. Although, humans can infer a proper label for each topic by looking at top representative words of the topic but, it is not applicable for machines. Automatic Topic Labeling techniques try to add...

2009
Duo Zhang ChengXiang Zhai Jiawei Han

As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand,...

1997
Brigitte Bigi Renato De Mori Marc El-Bèze Thierry Spriet

A new statistical method for Language Modeling and spoken document classification is proposed. It is based on a mixture of topic dependent probabilities. Each topic dependent probability is in turn a mixture of n-gram probabilities and the probability of Kullback-Lieber (KL) distances between key-word unigrams and distribution obtained from the content of a cache memory. Experimental result on ...

Journal: :International Journal of Computer Applications 2016

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید