Using Links to Classify Wikipedia Pages
نویسندگان
چکیده
This paper contains a description of experiments for the 2008 INEX XML-mining track. Our goal for the XML-mining track is to explore whether we can use link information to improve classification accuracy. Our approach is to propagate category probabilities over linked pages. We find that using link information leads to marginal improvements over a baseline that uses a Naive Bayes model. For the initially misclassified pages, link information is either not available or contains too much noise.
منابع مشابه
Classifying Wikipedia Articles into NE's Using SVM's with Threshold Adjustment
In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a ...
متن کاملEntity Disambiguation with Web Links
Entity disambiguation with Wikipedia relies on structured information from redirect pages, article text, inter-article links, and categories. We explore whether web links can replace a curated encyclopaedia, obtaining entity prior, name, context, and coherence models from a corpus of web pages with links to Wikipedia. Experiments compare web link models to Wikipedia models on well-known CoNLL a...
متن کاملDiscovering Missing Wikipedia Inter-language Links by means of Cross-lingual Word Sense Disambiguation
Wikipedia pages typically contain inter-language links to the corresponding pages in other languages. These links, however, are often incomplete. This paper describes a set of experiments in which the viability of discovering such missing inter-language links for ambiguous nouns by means of a cross-lingual Word Sense Disambiguation approach is investigated. The input for the inter-language link...
متن کاملFinding Entities in Wikipedia Using Links and Categories
In this paper we describe our participation in the INEX Entity Ranking track. We explored the relations between Wikipedia pages, categories and links. Our approach is to exploit both category and link information. Category information is used by calculating distances between document categories and target categories. Link information is used for relevance propagation and in the form of a docume...
متن کاملCross-lingual Dutch to English alignment using EuroWordNet and Dutch Wikipedia
This paper describes a system for linking the thesaurus of the Netherlands Institute for Sound and Vision to English WordNet and dbpedia. We used EuroWordNet, a multilingual wordnet, and Dutch Wikipedia as intermediaries for the two alignments. EuroWordNet covers most of the subject terms in the thesaurus, but the organization of the cross-lingual links makes selection of the most appropriate E...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008