latent semantic analysis

Probabilistic Latent Semantic Analysis (PLSA) untuk Klasifikasi Dokumen Teks Berbahasa Indonesia

Journal: :CoRR 2015

Derwin Suhartono

Abstrak Salah satu pekerjaan yang ada di dalam mengelola dokumen adalah bagaimana menemukan intisari dari dokumen. Topic modeling merupakan teknik yang dikembangkan untuk menghasilkan representasi dokumen berupa kata-kata kunci dari dokumen. Kata-kata kunci tersebut yang akan digunakan dalam proses pengindeksan serta pencarian dokumen untuk ditemukan kembali sesuai kebutuhan pengguna. Pada pene...

متن کامل

Workflow Activity Monitoring Using Dynamics of Pair-Wise Qualitative Spatial Relations

2012

Ardhendu Behera Anthony G. Cohn David C. Hogg

We present a method for real-time monitoring of workflows in a constrained environment. The monitoring system should not only be able to recognise the current step but also provide the instructions about the possible next steps in an ongoing workflow. In this paper, we address this issue by using a robust approach (HMM-pLSA) which relies on Hidden Markov Model (HMM) and generative model such as...

متن کامل

Maximum - likelihod adaptation of semi-continuous HMMs by latent variable decomposition of state distributions

2004

Antoine Raux Rita Singh

Compared to fully-continuous HMMs, semi-continuous HMMs are more compact in size, require less data to train well and result in comparable recognition performance with much faster decoding speeds. Nevertheless, the use of semi-continuous HMMs in large vocabulary speech recognition systems has declined considerably in recent years. A significant factor that has contributed this is that systems t...

متن کامل

On Automatic Annotation of Images with Latent Space Models

2003

Florent Monay Daniel Gatica-Perez

Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance with respect to other approaches remains unknown. In this paper, we apply and compare two simple latent space models commonly used in text an...

متن کامل

Using latent semantic analysis to assess reader strategies

Journal: :Behavior Research Methods, Instruments, & Computers 2002

متن کامل

Multi-view learning via probabilistic latent semantic analysis

Journal: :Inf. Sci. 2012

Fuzhen Zhuang George Karypis Xia Ning Qing He Zhongzhi Shi

Multi-view learning arouses vast amount of interest in the past decades with numerous real-world applications in web page analysis, bioinformatics, image processing and so on. Unlike the most previous works following the idea of co-training, in this paper we propose a new generative model for Multi-view Learning via Probabilistic Latent Semantic Analysis, called MVPLSA. In this model, we jointl...

متن کامل

Topic Modeling over Short Texts by Incorporating Word Embeddings

2017

Jipeng Qiang Ping Chen Tong Wang Xindong Wu

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-o...

متن کامل

Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions

2013

Xiaoming Lu Lei Xie Cheung-Chi Leung Bin Ma Haizhou Li

We present an efficient approach for broadcast news story segmentation using a manifold learning algorithm on latent topic distributions. The latent topic distribution estimated by Latent Dirichlet Allocation (LDA) is used to represent each text block. We employ Laplacian Eigenmaps (LE) to project the latent topic distributions into low-dimensional semantic representations while preserving the ...

متن کامل

主題語言模型於大詞彙連續語音辨識之研究 (On the Use of Topic Models for Large-Vocabulary Continuous Speech Recognition) [In Chinese]

2009

Kuan-Yu Chen Berlin Chen

本論文研究使用主題資訊之語言模型(Language Model)。當語言模型用於大詞彙連續語音辨識時,其主要的任務是藉由已解碼歷史詞序列資訊來預測下一個候選詞出現的可能性。傳統的 N 連(N-gram)語言模型容易受限於模型參數過多的問題,僅能用來擷取短距離的詞彙接連資訊,並不能考慮完整的歷史詞序列之語意資訊。因此,近十幾年來許多研究學者陸續提出各式主題模型(Topic Model),包括討論文件與詞之關係的機率式潛藏語意分析(Probabilistic Latent Semantic Analysis, PLSA)和潛藏狄利克里分配(Latent Dirichlet Allocation, LDA),以及討論詞虛擬文件與詞關係的詞主題模型(Word Topic Model, WTM)。這些模型主要都是透過一組潛藏的主題機率分布來描述文件與詞、或者詞虛擬文件與詞之間的關係...

متن کامل

How semantic is Latent Semantic Analysis?

2007

Tonio Wandmacher

Au cours des dix dernières années, l'analyse de la sémantique latente (LSA) a été utilisée dans de nombreuses approches TAL avec parfois de remarquables succès. Cependant, ses capacités à exprimer des ressemblances sémantiques n’ont pas été réellement recherchées de façon systématique. C’est l’objectif de ce travail, où la LSA est appliquée à un corpus de textes de langue courante (journal alle...

متن کامل