Integration of PLSA into Probabilistic CLIR Model - Yokohama National University at NTCIR4 CLIR
نویسندگان
چکیده
In this paper, we propose a method of CrossLanguage Information Retrieval based on an integration of a probabilistic CLIR model and Probabilistic Latent Semantic Analysis (PLSA). PLSA is adopted to extract the information of translation probability from a parallel corpus. The information is utilized in a probabilistic CLIR model. Although the probabilistic CLIR model with PLSA is quite effective, it takes very long time in the processing. We therefore introduce an approximation method based on a two-phased retrieval model in order to reduce the computational cost. Using the model, we submitted runs for Japaneseto-English bilingual retrieval in CLIR task of NTCIR4.
منابع مشابه
University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion
Pseudo-relevance feedback, while useful in monolingual applications for refining and enriching short user queries, proves even more important in crosslanguage information retrieval (CLIR). For CLIR, query expansion before and after translation can provide an opportunity to recover from translation gaps, reduce ambiguity, and enhance recall. Furthermore, for CLIR in unsegmented Asian languages, ...
متن کاملRicoh in the NTCIR4 CLIR Tasks
This paper describes Ricoh’s participation in the NTCIR-4 CLIR tasks. We used the same approach as we took at the NTCIR-3 IR tasks for Japanese. We applied our system using a Traditional/Simplified Chinese converter and n-gram indexing for the Chinese IR task. The results show that our simple approach for Chinese IR can provide information retrieval for both Traditional and Simplified Chinese.
متن کاملNTCIR-6 CLIR Experiments at Osaka Kyoiku University - Term Expansion Using Online Dictionaries and Weighting Score by Term Variety
This paper describes experimental results of J-J subtask of NTCIR-6 CLIR. We expanded query term using online dictionaries in a WEB. It was effective for some topics of which average precision was low. Probabilistic model were employed for scoring, and we modified this score multiplying by the number of varieties of query terms, also. In most cases this works well. Query term reduction should b...
متن کاملImplicit ambiguity resolution using incremental clustering in cross-language information retrieval
This paper presents a method to implicitly resolve ambiguities using dynamic incremental clustering in cross-language information retrieval (CLIR) such as Korean-to-English and Japanese-to-English CLIR. The main objective of this paper shows that document clusters can effectively resolve the ambiguities tremendously increased in translated queries as well as take into account the context of all...
متن کاملA Probabilistic Translation Method for Dictionary-based Cross-lingual Information Retrieval in Agglutinative Languages
Translation ambiguity, out of vocabulary words and missing some translations in bilingual dictionaries make dictionary-based Crosslanguage Information Retrieval (CLIR) a challenging task. Moreover, in agglutinative languages which do not have reliable stemmers, missing various lexical formations in bilingual dictionaries degrades CLIR performance. This paper aims to introduce a probabilistic tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004