نتایج جستجو برای: latent semantic analysis

تعداد نتایج: 2942209  

2003
Riccardo Serafin Barbara Di Eugenio Michael Glass

This paper presents our experiments in applying Latent Semantic Analysis (LSA) to dialogue act classification. We employ both LSA proper and LSA augmented in two ways. We report results on DIAG, our own corpus of tutoring dialogues, and on the CallHome Spanish corpus. Our work has the theoretical goal of assessing whether LSA, an approach based only on raw text, can be improved by using additio...

2007
Nicolas Béchet Mathieu Roche Jacques Chauché

Latent Semantic Analysis is used in many research fields with several applications of classifications. We propose to improve LSA with additional semantic information found with syntactic knowledge.

2005
Masato Hagiwara Yasuhiro Ogawa Katsuhiko Toyama

When acquiring synonyms from large corpora, it is important to deal not only with such surface information as the context of the words but also their latent semantics. This paper describes how to utilize a latent semantic model PLSI to acquire synonyms automatically from large corpora. PLSI has been shown to achieve a better performance than conventional methods such as tf·idf and LSI, making i...

2012
Ning Li Fuzhen Zhuang Qing He Zhongzhi Shi

PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently d...

2009
Leonhard Hennig

We consider the problem of query-focused multidocument summarization, where a summary containing the information most relevant to a user’s information need is produced from a set of topic-related documents. We propose a new method based on probabilistic latent semantic analysis, which allows us to represent sentences and queries as probability distributions over latent topics. Our approach comb...

2008
Anastasia Krithara Massih-Reza Amini Jean-Michel Renders Cyril Goutte

This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled. The proposed approach iteratively labels the unlabeled documents and estimates the probabilities of its labeling errors. These probabilities are then taken into account in the estimation of the new model parameters before the...

2007
Bhaskar Mehta

Recommender systems have been steadily gaining popularity and has been deployed by several service providers. Large scalable deployment has however highlighted one of the design problems of recommender systems: lack of interoperability. Users today often use multiple electronic systems offering recommendations, which cannot learn from one another. The result is that the end user has to often pr...

2011
Timothy J. Hazen

This work presents techniques for automatically summarizing the topical content of an audio corpus. Probabilistic latent semantic analysis (PLSA) is used to learn a set of latent topics in an unsupervised fashion. These latent topics are ranked by their relative importance in the corpus and a summary of each topic is generated from signature words that aptly describe the content of that topic. ...

2006
Lin-Shan Lee Sheng-yi Kong Yi-Cheng Pan Yi-Sheng Fu Yu-tsun Huang Chien-Chih Wang

The multi-media archives are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives in the network content to help the user in browsing and retrieval. In a recent paper [1] we proposed a complete set of multi-layered technologies to handle at least some of the above issues: (1) Autom...

2002
Thorsten Brants Francine Chen Ayman Farahat

Abstract We adopt algorithms for document topic analysis, consisting of segmentation and topic identification, to Arabic. By doing so, we outline the requirements for Arabic language resources that facilitate building, training, and fine-tuning systems that perform these tasks. Our segmentation and topic identification algorithm is based on Probabilistic Latent Semantic Analysis. First results ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید