Using Concept base and Wikipedia for Cross-Lingual Link Discovery
نویسندگان
چکیده
[email protected] Abstract This paper describes our method for the Cross-Lingual Link Discovery (CLLD). We used English-Japanese document collections in CLLD subtask of NTCIR-9. The topics in our method are translated by Wikipedia. Wikipedia is written by multi-language. In our method, the page written by the target language is retrieved for each topic written in the source language. The topic written in the target language is made from Wikipedia concept part of this page. Cross-language link is retrieved by a TF-IDF model.We use nouns, nouns phrase and adjective to make concept base. Re-ranked result retrieved by TF-IDF model. TFIDF and concept base are made from the outline part of Wikipidia pages, which are written in the target language extracted in Wikipidia pages collection. Crosslink Evaluation Tool of NTCIR9 Crosslink Task is utilized for performance evaluation.
منابع مشابه
An evaluation framework for cross-lingual link discovery
Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case wit...
متن کاملAutomated Cross-lingual Link Discovery in Wikipedia
At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-l...
متن کاملUsing Explicit Semantic Analysis for Cross-Lingual Link Discovery
This paper explores how to automatically generate cross-language links between resources in large document collections. The paper presents new methods for Cross-Lingual Link Discovery (CLLD) based on Explicit Semantic Analysis (ESA). The methods are applicable to any multilingual document collection. In this report, we present their comparative study on the Wikipedia corpus and provide new insi...
متن کاملNTCIR-10 CrossLink-2 Task: A Link Mining Strategy
At NTCIR-10 we participated in the cross-lingual link discovery (CrossLink-2) task. In this paper we describe our systems for discovering cross-lingual links between the Chinese, Japanese, and Korean (CJK) Wikipedia and the English Wikipedia. The evaluation results show that our implementation of the crosslingual linking method achieved promising results.
متن کاملKMI, The Open University at NTCIR-9 CrossLink: Cross-Lingual Link Discovery in Wikipedia Using Explicit Semantic Analysis
This paper describes the methods used in the submission of Knowledge Media institute (KMI), The Open University to the NTCIR-9 Cross-Lingual Link Discovery (CLLD) task entitled CrossLink. KMI submitted four runs for link discovery from English to Chinese; however, the developed methods, which utilise Explicit Semantic Analysis (ESA), are applicable also to other language combinations. Three of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011