wikipedia mining

Mining Interesting Trivia for Entities from Wikipedia

Journal: :CoRR 2015

Abhay Prakash

TRIVIA is any fact about an entity, which is interesting due to any of the following characteristics − unusualness, uniqueness, unexpectedness or weirdness. Such interesting facts are provided in Did You Know? section at many places. Although trivia are facts of little importance to be known, but we have presented their usage in user engagement purpose. Such fun facts generally spark intrigue a...

متن کامل

Web Corpus Mining By Instance Of Wikipedia

2006

Rüdiger Gleim Alexander Mehler Matthias Dehmer

In this paper we present an approach on structure learning in the area of web documents. This is done in order to approach the goal of webgenre tagging in the area of web corpus linguistics. A central outcome of the paper is that purely structure oriented approaches to web document classification provide an information gain which may be utilized in combined approaches of web content and structu...

متن کامل

Boot-Strapping Language Identifiers for Short Colloquial Postings

2013

Moisés Goldszmidt Marc Najork Stelios Paparizos

There is tremendous interest in mining the abundant user generated content on the web. Many analysis techniques are language dependent and rely on accurate language identification as a building block. Even though there is already research on language identification, it focused on very ‘clean’ editorially managed corpora, on a limited number of languages, and on relatively large-sized documents....

متن کامل

Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility

Journal: :CoRR 2012

Jay Gholap

Data mining involves the systematic analysis of large data sets , and data mining in agricultural soil datasets is exciting and modern research area. The productive capacity of a soil depends on soil fertility. Achieving and maintaining appropriate levels of soil fertility, is of utmost importance if agricultural land is to remain capable of nourishing crop production. In this research, Steps f...

متن کامل

Named Entity Relation Mining using Wikipedia

2008

Adrian Iftene Alexandra Balahur

Discovering relations among Named Entities (NEs) from large corpora is both a challenging, as well as useful task in the domain of Natural Language Processing, with applications in Information Retrieval (IR), Summarization (SUM), Question Answering (QA) and Textual Entailment (TE). The work we present resulted from the attempt to solve practical issues we were confronted with while building sys...

متن کامل

Knowledge Mining Wikipedia: An Ontological Approach

Journal: :IJKSS 2011

Herbert Lee Keith Chan Eric Tsui

The organization of information in the knowledge economy has become a primary business process in many enterprises. The better information is organized and stored, the easier it can be retrieved, so that the most relevant information will always be available. Ontology is a versatile technology for organizing information; however, the main obstacle that prevents ontology prevailing is the diffic...

متن کامل

Relation Extraction from Wikipedia Using Subtree Mining

2007

Dat P. T. Nguyen Yutaka Matsuo Mitsuru Ishizuka

The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia’s English articles, which in turn can serve for intelligent systems to satisfy users’ information needs. Our ...

متن کامل

Automatic Wikibook Prototyping via Mining Wikipedia

Journal: :IJCLCLP 2008

Jen-Liang Chou Shih-Hung Wu

Wikipedia is the world’s largest collaboratively edited source of encyclopedic knowledge. Wikibook is a sub-project of Wikipedia that is intended to create a book that can be edited by various contributors, similar to how Wikipedia is composed and edited. Editing a book, however, requires more effort than editing separate articles. Therefore, methods of quickly prototyping a book is a new resea...

متن کامل

Subtree Mining for Relation Extraction from Wikipedia

2007

Dat P. T. Nguyen Yutaka Matsuo Mitsuru Ishizuka

In this study, we address the problem of extracting relations between entities fromWikipedia’s English articles. Our proposed method first anchors the appearance of entities in Wikipedia’s articles using neither Named Entity Recognizer (NER) nor coreference resolution tool. It then classifies the relationships between entity pairs using SVM with features extracted from the web structure and sub...

متن کامل

An open-source toolkit for mining Wikipedia

Journal: :Artif. Intell. 2013

David N. Milne Ian H. Witten

The online encyclopedia Wikipedia is a vast repository of information. For developers and researchers it represents a giant multilingual database of concepts and semantic relations; a promising resource for natural language processing and many other research areas. In this paper we introduce the Wikipedia Miner toolkit: an open-source collection of code that allows researchers and developers to...

متن کامل