نتایج جستجو برای: wikipedia mining

تعداد نتایج: 92181  

Journal: :CoRR 2015
Abhay Prakash

TRIVIA is any fact about an entity, which is interesting due to any of the following characteristics − unusualness, uniqueness, unexpectedness or weirdness. Such interesting facts are provided in Did You Know? section at many places. Although trivia are facts of little importance to be known, but we have presented their usage in user engagement purpose. Such fun facts generally spark intrigue a...

2006
Rüdiger Gleim Alexander Mehler Matthias Dehmer

In this paper we present an approach on structure learning in the area of web documents. This is done in order to approach the goal of webgenre tagging in the area of web corpus linguistics. A central outcome of the paper is that purely structure oriented approaches to web document classification provide an information gain which may be utilized in combined approaches of web content and structu...

2013
Moisés Goldszmidt Marc Najork Stelios Paparizos

There is tremendous interest in mining the abundant user generated content on the web. Many analysis techniques are language dependent and rely on accurate language identification as a building block. Even though there is already research on language identification, it focused on very ‘clean’ editorially managed corpora, on a limited number of languages, and on relatively large-sized documents....

Journal: :CoRR 2012
Jay Gholap

Data mining involves the systematic analysis of large data sets , and data mining in agricultural soil datasets is exciting and modern research area. The productive capacity of a soil depends on soil fertility. Achieving and maintaining appropriate levels of soil fertility, is of utmost importance if agricultural land is to remain capable of nourishing crop production. In this research, Steps f...

2008
Adrian Iftene Alexandra Balahur

Discovering relations among Named Entities (NEs) from large corpora is both a challenging, as well as useful task in the domain of Natural Language Processing, with applications in Information Retrieval (IR), Summarization (SUM), Question Answering (QA) and Textual Entailment (TE). The work we present resulted from the attempt to solve practical issues we were confronted with while building sys...

Journal: :IJKSS 2011
Herbert Lee Keith Chan Eric Tsui

The organization of information in the knowledge economy has become a primary business process in many enterprises. The better information is organized and stored, the easier it can be retrieved, so that the most relevant information will always be available. Ontology is a versatile technology for organizing information; however, the main obstacle that prevents ontology prevailing is the diffic...

2007
Dat P. T. Nguyen Yutaka Matsuo Mitsuru Ishizuka

The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia’s English articles, which in turn can serve for intelligent systems to satisfy users’ information needs. Our ...

Journal: :IJCLCLP 2008
Jen-Liang Chou Shih-Hung Wu

Wikipedia is the world’s largest collaboratively edited source of encyclopedic knowledge. Wikibook is a sub-project of Wikipedia that is intended to create a book that can be edited by various contributors, similar to how Wikipedia is composed and edited. Editing a book, however, requires more effort than editing separate articles. Therefore, methods of quickly prototyping a book is a new resea...

2007
Dat P. T. Nguyen Yutaka Matsuo Mitsuru Ishizuka

In this study, we address the problem of extracting relations between entities fromWikipedia’s English articles. Our proposed method first anchors the appearance of entities in Wikipedia’s articles using neither Named Entity Recognizer (NER) nor coreference resolution tool. It then classifies the relationships between entity pairs using SVM with features extracted from the web structure and sub...

Journal: :Artif. Intell. 2013
David N. Milne Ian H. Witten

The online encyclopedia Wikipedia is a vast repository of information. For developers and researchers it represents a giant multilingual database of concepts and semantic relations; a promising resource for natural language processing and many other research areas. In this paper we introduce the Wikipedia Miner toolkit: an open-source collection of code that allows researchers and developers to...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید