Information Extraction from Wikipedia Using Pattern Learning
نویسنده
چکیده
In this paper we present solutions for the crucial task of extracting structured information from massive free-text resources, such as Wikipedia, for the sake of semantic databases serving upcoming Semantic Web technologies. We demonstrate both a verb frame-based approach using deep natural language processing techniques with extraction patterns developed by human knowledge experts and machine learning methods using shallow linguistic processing. We also propose a method for learning verb frame-based extraction patterns automatically from labeled data. We show that labeled training data can be produced with only minimal human effort by utilizing existing semantic resources and the special characteristics of Wikipedia. Custom solutions for named entity recognition are also possible in this scenario. We present evaluation and comparison of the different approaches for several different relations.
منابع مشابه
PORE: Positive-Only Relation Extraction from Wikipedia Text
Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to...
متن کاملMulti-view Bootstrapping for Relation Extraction by Exploring Web Features and Linguistic Features
Binary semantic relation extraction from Wikipedia is particularly useful for various NLP and Web applications. Currently frequent pattern miningbased methods and syntactic analysis-based methods are two types of leading methods for semantic relation extraction task. With a novel view on integrating syntactic analysis on Wikipedia text with redundancy information from the Web, we propose a mult...
متن کاملAXIOpedia: Enriching DBpedia with OWL Axioms from Wikipedia
The Semantic Web relies on the creation of rich knowledge bases which link data on the Web. Having access to such a knowledge base enables significant progress in difficult and challenging tasks such as semantic annotation and retrieval. DBpedia, the RDF representation of Wikipedia, is considered today as the central interlinking hub for the emerging Web of data. However, DBpedia still displays...
متن کاملLearning Word-Class Lattices for Definition and Hypernym Extraction
Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches – mostly focused on lexicosyntactic patterns – suffer from both low recall and precision, as definitional sentences occur in highly variable synta...
متن کاملUsing Information Extraction to Generate Trigger Questions for Academic Writing Support
Automated question generation approaches have been proposed to support reading comprehension. However, these approaches are not suitable for supporting writing activities. We present a novel approach to generate different forms of trigger questions (directive and facilitative) aimed at supporting deep learning. Useful semantic information from Wikipedia articles is extracted and linked to the k...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Acta Cybern.
دوره 19 شماره
صفحات -
تاریخ انتشار 2010