نتایج جستجو برای: wikipedia mining

تعداد نتایج: 92181  

2008
Torsten Zesch Christof Müller Iryna Gurevych

Recently, collaboratively constructed resources such as Wikipedia and Wiktionary have been discovered as valuable lexical semantic knowledge bases with a high potential in diverse Natural Language Processing (NLP) tasks. Collaborative knowledge bases however significantly differ from traditional linguistic knowledge bases in various respects, and this constitutes both an asset and an impediment...

2015
Abhay Prakash Manoj Kumar Chinnakotla Dhaval Patel Puneet Garg

Trivia is any fact about an entity which is interesting due to its unusualness, uniqueness, unexpectedness or weirdness. In this paper, we propose a novel approach for mining entity trivia from their Wikipedia pages. Given an entity, our system extracts relevant sentences from its Wikipedia page and produces a list of sentences ranked based on their interestingness as trivia. At the heart of ou...

2013
Johannes Daxenberger Iryna Gurevych

In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machin...

2009
Kun Yu Junichi Tsujii

The way of mining comparable corpora and the strategy of dictionary extraction are two essential elements of bilingual dictionary extraction from comparable corpora. This paper first proposes a method, which uses the interlanguage link in Wikipedia, to build comparable corpora. The large scale of Wikipedia ensures the quantity of collected comparable corpora. Besides, because the inter-language...

2013
Hassan Mostafa

The rapid growth in the number of documents available to end users from around the world has led to a greatly-increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. In this work, a novel technique is proposed, to automatically construct a background knowledge structure ...

2010
Yiping Zhou Lan Nie Omid Rouhani-Kalleh Flavian Vasile Scott Gaffney

Ambiguity of entity mentions and concept references is a challenge to mining text beyond surface-level keywords. We describe an effective method of disambiguating surface forms and resolving them to Wikipedia entities and concepts. Our method employs an extensive set of features mined from Wikipedia and other large data sources, and combines the features using a machine learning approach with a...

2008
Shun-Feng Su HIROSHI UEDA HARUMI MURAKAMI SHOJI TATSUMI

We proposed a method that suggests subject headings based on user queries when a pattern-matching algorithm fails to locate subject searches for Online Public Access Catalogs (OPAC). We combined information obtained from Wikipedia, Amazon, and Google for query expansion. Our method has two main advantages: (1) availability for any library without customizing OPACs, and (2) ability to suggest su...

2008
Tien Tran Sangeetha Kutty Richi Nayak

This paper reports on the experiments and results of a clustering approach used in the INEX 2008 Document Mining Challenge. The clustering approach utilizes both the structure and the content information of the XML documents in the Wikipedia collection. The content of the XML documents is measured using the latent semantic kernel (LSK). A well-known problem with the construction of latent seman...

2013

As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more "Neutral Point of View" way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and socia...

Journal: :CoRR 2016
Antti Puurula

The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a n...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید