wikipedia mining

Mining Relations between Wikipedia Categories

2010

Julian Szymanski

The paper concerns the problem of automatic category system creation for a set of documents connected with references. Presented approach has been evaluated on the Polish Wikipedia, where two graphs: the Wikipedia category graph and article graph has been analyzed. The linkages between Wikipedia articles has been used to create a new category graph with weighted edges. We compare the created ca...

متن کامل

An Empirical Research: "Wikipedia Vandalism Detection using VandalSense 2.0" - Notebook for PAN at CLEF 2011

2011

F. Gediz Aksit

Wikipedia despite having a very small budget has been among the top ten most visited websites for over half a decade. Being this visible also generated the problem of ill intended people modifying Wikipedia in a destructive manner. VandalSense is an experimental tool programmed by F. Gediz Aksit to automatically identify vandalism on Wikipedia through the use of machine learning and text mining...

متن کامل

Mining temporal footprints from Wikipedia

2014

Michele Filannino Goran Nenadic

Discovery of temporal information is key for organising knowledge and therefore the task of extracting and representing temporal information from texts has received an increasing interest. In this paper we focus on the discovery of temporal footprints from encyclopaedic descriptions. Temporal footprints are time-line periods that are associated to the existence of specific concepts. Our approac...

متن کامل

Mining Multiword Terms from Wikipedia

2015

Silvana Hartmann György Szarvas

The collection of the specialized vocabulary of a particular domain (terminology) is an important initial step of creating formalized domain knowledge representations (ontologies). Terminology Extraction (TE) aims at automating this process by collecting the relevant domain vocabulary from existing lexical resources or collections of domain texts. In this chapter, the authors address the extrac...

متن کامل

Mining Transliterations from Wikipedia Using Pair HMMs

2010

Peter Nabende

This paper describes the use of a pair Hidden Markov Model (pair HMM) system in mining transliteration pairs from noisy Wikipedia data. A pair HMM variant that uses nine transition parameters, and emission parameters associated with single character mappings between source and target language alphabets is identified and used in estimating transliteration similarity. The system resulted in a pre...

متن کامل

Taxonomic Relation Extraction from Wikipedia: Datasets and Algorithms

2011

Mike Chen Razvan Bunescu

The dynamic and continuously growing category structure of Wikipedia has been used in numerous ontology extraction methods. We present a dataset of category subgraphs automatically extracted from Wikipedia that are manually annotated for is-a and instance-of relations in order to enable a more comprehensive evaluation of taxonomy mining approaches. We also show how the new dataset can be used w...

متن کامل

Extracting Named Entities and Relating Them over Time Based on Wikipedia

Journal: :Informatica (Slovenia) 2007

Abhijit Bhole Blaz Fortuna Marko Grobelnik Dunja Mladenic

This paper presents an approach to mining information relating people, places, organizations and events extracted from Wikipedia and linking them on a time scale. The approach consists of two phases: (1) identifying relevant pages categorizing the articles as containing people, places or organizations; (2) generating timeline linking named entities and extracting events and their time frame. We...

متن کامل

Bridging Layperson's Queries with Medical Concepts- GRIUM@CLEF2015 eHealth Task 2

2015

Xiaojie Liu Jian-Yun Nie

Concepts are often used in Medical Information Retrieval. In any conceptbased method one has to extract concepts from texts (query or document). MetaMap is often used for this task. However, if the query is issued by a layperson, the query may not contain the appropriate concept expressions and MetaMap will fail to extract correct concepts. In this situation we need to explore other resources t...

متن کامل

Insights from the Wikipedia Contest (IEEE Contest for Data Mining 2011)

Journal: :CoRR 2014

Kalpit V. Desai Roopesh Ranjan

The Wikimedia Foundation has recently observed that newly joining editors on Wikipedia are increasingly failing to integrate into the Wikipedia editors’ community, i.e. the community is becoming increasingly harder to penetrate [1]. To sustain healthy growth of the community, the Wikimedia Foundation aims to quantitatively understand the factors that determine the editing behavior, and explain ...

متن کامل

Mining Wikipedia's Snippets Graph - First Step to Build a New Knowledge Base

2012

Andias Wira-Alam Brigitte Mathiak

In this paper, we discuss the aspects of mining links and text snippets from Wikipedia as a new knowledge base. Current knowledge base, e.g. DBPedia[1], covers mainly the structured part of Wikipedia, but not the content as a whole. Acting as a complement, we focus on extracting information from the text of the articles. We extract a database of the hyperlinks between Wikipedia articles and pop...

متن کامل