wikipedia

Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus

Journal: :Inf. Process. Manage. 2017

Mohamad Mehdi Chitu Okoli Mostafa Mesgari Finn Årup Nielsen Arto Lanamäki

Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge base that has been continuously exploited by researchers in a wide variety of domains. This article systematically reviews the scholarly studies that have used Wikipedia as a data source, and investigates the means by which Wikipedia has been employed in three main computer science research areas: information...

متن کامل

Wikipedia Tools for Google Spreadsheets

2016

Thomas Steiner

In this paper, we introduce the Wikipedia Tools for Google Spreadsheets. Google Spreadsheets is part of a free, Webbased software office suite offered by Google within its Google Docs service. It allows users to create and edit spreadsheets online, while collaborating with other users in realtime. Wikipedia is a free-access, free-content Internet encyclopedia, whose content and data is availabl...

متن کامل

Adding High-Precision Links to Wikipedia

2014

Thanapon Noraset Chandra Bhagavatula Doug Downey

Wikipedia’s link structure is a valuable resource for natural language processing tasks, but only a fraction of the concepts mentioned in each article are annotated with hyperlinks. In this paper, we study how to augment Wikipedia with additional high-precision links. We present 3W, a system that identifies concept mentions in Wikipedia text, and links each mention to its referent page. 3W leve...

متن کامل

Hedera: Scalable Indexing, Exploring Entities in Wikipedia Revision History

2014

Tuan A. Tran Tu Ngoc Nguyen

Much of work in semantic web relying on Wikipedia as the main source of knowledge often work on static snapshots of the dataset. The full history of Wikipedia revisions, while contains much more useful information, is still difficult to access due to its exceptional volume. To enable further research on this collection, we developed a tool, named Hedera, that efficiently extracts semantic infor...

متن کامل

Time evolution of Wikipedia network ranking

Journal: :CoRR 2013

Young-Ho Eom Klaus M. Frahm András A. Benczúr Dima Shepelyansky

Abstract. We study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003 2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007 2011. A special emphasis is done on ranking of Wikipedia personalities ...

متن کامل

Augmenting Wikipedia with Named Entity Tags

2008

Wisam Dakka Silviu Cucerzan

Wikipedia is the largest organized knowledge repository on the Web, increasingly employed by natural language processing and search tools. In this paper, we investigate the task of labeling Wikipedia pages with standard named entity tags, which can be used further by a range of information extraction and language processing tools. To train the classifiers, we manually annotated a small set of W...

متن کامل

Focused Access to Wikipedia

2006

Börkur Sigurbjörnsson Jaap Kamps Maarten de Rijke

Wikipedia is a “free” online encyclopedia. It contains millions of entries in many languages and is growing at a fast pace. Due to its volume, search engines play an important role in giving access to the information in Wikipedia. The “free” availability of the collection makes it an attractive corpus for information retrieval experiments. In this paper we describe the evaluation of a search en...

متن کامل

Wikipedia as an Academic Reference: Faculty and Student Viewpoints

2010

Johnny Snyder

Wikis are becoming popular with business and academia as a way to harvest, archive, and manage knowledge. One of the most popular and well-known wikis is Wikipedia, the online encyclopedia started by Jimmy Wales and Larry Sanger in 2001. Since its inception, much has been written (both pro and con) about Wikipedia; however, Wikipedia is one of the most popular sites on the Internet today. As it...

متن کامل

Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents

Journal: :Comput. J. 2012

Gerasimos Spanakis Georgios Siolas Andreas Stafylopatis

In this paper, we propose a novel method for conceptual hierarchical clustering of documents using knowledge extracted from Wikipedia. The proposed method overcomes the classic bag-of-words models disadvantages through the exploitation of Wikipedia textual content and link structure. A robust and compact document representation is built in real-time using the Wikipedia application programmer’s ...

متن کامل

Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

2012

Sungchul Kim Kristina Toutanova Hwanjo Yu

In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences. The combination is achieved using a novel semi-CRF model for forei...

متن کامل