latent semantic analysis

Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges

2014

Andreas Holzinger Johannes Schantl Miriam Schroettner Christin Seifert Karin M. Verspoor

Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making – neither images nor multimedia data. However...

متن کامل

Variational Extensions to EM and Multinomial PCA

2002

Wray L. Buntine

Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include non-negative matrix factorization, probabilistic latent semantic analysis, and latent Dirichlet allocation. This paperbegins with a review of the basic theory of the variational ext...

متن کامل

Practical issues in developing semantic frameworks for the analysis of verbal fluency data: A Norwegian data case study

2015

Mark Rosenstein Peter W. Foltz Anja Vaskinn Brita Elvevåg

Background: Verbal fluency tasks, which require producing as many words in response to a cue in a fixed time, are widely used within clinical neuropsychology and in neuropsychological research. Although semantic word lists can be elicited, typically only the number of words related to the cue is interpreted thus ignoring any structure in the word sequences. Automated language techniques can pro...

متن کامل

Exploiting Probabilistic Latent Information for the Construction of Community Web Directories

2005

Dimitrios Pierrakos Georgios Paliouras

This paper improves a recently-presented approach to Web Personalization, named Community Web Directories, which applies personalization techniques to Web Directories. The Web directory is viewed as a concept hierarchy and personalization is realized by constructing user community models on the basis of usage data collected by the proxy servers of an Internet Service Provider. The user communit...

متن کامل

Simplex Decompositions using SVD and PLSA

2012

Madhusudana V. S. Shashanka Michael Giering

Probabilistic Latent Semantic Analysis (PLSA) is a popular technique to analyze non-negative data where multinomial distributions underlying every data vector are expressed as linear combinations of a set of basis distributions. These learned basis distributions that characterize the dataset lie on the standard simplex and themselves represent corners of a simplex within which all data approxim...

متن کامل

Finding Answers to Definition Questions

2012

Han Ren Donghong Ji Jing Wan Chong Teng

Current researches on Question Answering concern more complex questions than factoid ones. Since definition questions are investigated by many researches, how to acquire accurate answers still becomes a core problem for definition QA. Although some systems use web knowledge bases to improve answer acquisition, we propose an approach that leverage them in an effective way. After summarizing defi...

متن کامل

Integration of PLSA into Probabilistic CLIR Model - Yokohama National University at NTCIR4 CLIR

2004

Tetsu Muramatsu Tatsunori Mori

In this paper, we propose a method of CrossLanguage Information Retrieval based on an integration of a probabilistic CLIR model and Probabilistic Latent Semantic Analysis (PLSA). PLSA is adopted to extract the information of translation probability from a parallel corpus. The information is utilized in a probabilistic CLIR model. Although the probabilistic CLIR model with PLSA is quite effectiv...

متن کامل

A Comparative Evaluation of Data-driven Models in Translation Selection of Machine Translation

2002

Yuseop Kim Jeong Ho Chang Byoung-Tak Zhang

We present a comparative evaluation of two data-driven models used in translation selection of English-Korean machine translation. Latent semantic analysis(LSA) and probabilistic latent semantic analysis (PLSA) are applied for the purpose of implementation of data-driven models in particular. These models are able to represent complex semantic structures of given contexts, like text passages. G...

متن کامل

Effect of Pronoun Resolution on Document Similarity

2010

Atul Kumar Sudip Sanyal Thomas Hofmann Tuomo Kakkonen Niko Myller Jari Timonen Erkki Sutinen

This paper presents a novel effect of Pronoun Resolution on measurement of document similarity. In this paper we have studied the effect of pronoun resolution within the framework of the Vector Space Model and Probabilistic Latent Semantic Analysis. For this purpose we have developed a Benchmark Corpus consisting of documents whose similarity scores have been given by human beings. We measured ...

متن کامل

Automatic Essay Grading With Probabilistic Latent Semantic Analysis

2005

Tuomo Kakkonen Niko Myller Jari Timonen Erkki Sutinen

Probabilistic Latent Semantic Analysis (PLSA) is an information retrieval technique proposed to improve the problems found in Latent Semantic Analysis (LSA). We have applied both LSA and PLSA in our system for grading essays written in Finnish, called Automatic Essay Assessor (AEA). We report the results comparing PLSA and LSA with three essay sets from various subjects. The methods were found ...

متن کامل