Large-Scale Hierarchical Topic Models
نویسندگان
چکیده
In the past decade, a number of advances in topic modeling have produced sophisticated models that are capable of generating hierarchies of topics. One challenge for these models is scalability: they are incapable of working at the massive scale of millions of documents and hundreds of thousands of terms. We address this challenge with a technique that learns a hierarchy of topics by iteratively applying topic models and processing subtrees of the hierarchy in parallel. This approach has a number of scalability advantages compared to existing techniques, and shows promising results in experiments assessing runtime and human evaluations of quality. We detail extensions to this approach that may further improve hierarchical topic modeling for large-scale applications.
منابع مشابه
Traffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملAnalysis of Hierarchical Bayesian Models for Large Space Time Data of the Housing Prices in Tehran
Housing price data is correlated to their location in different neighborhoods and their correlation is type of spatial (location). The price of housing is varius in different months, so they also have a time correlation. Spatio-temporal models are used to analyze this type of the data. An important purpose of reviewing this type of the data is to fit a suitable model for the spatial-temporal an...
متن کاملHierarchical Relational Models for Document Networks
We develop the relational topic model (RTM), a hierarchical model of both network structure and node attributes. We focus on document networks, where the attributes of each document are its words, i.e., discrete observations taken from a fixed vocabulary. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be use...
متن کاملBiomass Modeling of Larch (Larix spp.) Plantations in China Based on the Mixed Model, Dummy Variable Model, and Bayesian Hierarchical Model
With the development of national-scale forest biomass monitoring work, accurate estimation of forest biomass on a large scale is becoming an important research topic in forestry. In this study, the stem wood, branches, stem bark, needles, roots and total biomass models for larch were developed at the regional level, using a general allometric equation, a dummy variable model, a mixed effects mo...
متن کاملHierarchical Bayesian Models for Applications in Information Retrieval
We present a simple hierarchical Bayesian approach to the modeling collections of texts and other large-scale data collections. For text collections, we posit that a document is generated by choosing a random set of multinomial probabilities for a set of possible “topics,” and then repeatedly generating words by sampling from the topic mixture. This model is intractable for exact probabilistic ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012