web information extraction

Logic Programs for Intelligent Web Search

1999

Bernd Thomas

We present a general framework for information extraction from web pages based on a special wrapper language, called token-templates. By using token-templates in conjunction with logic programs we are able to reason about web page contents, search and collect facts and derive new facts from various web pages. We give a formal definition for the semantics of logic programs extended by token-temp...

متن کامل

quantum model for information retrieval in web 2.0

Journal: :international journal of information science and management 0

mohammad bagher negahban department of information sciences shahid bahonar university of kerman ali reza sepehri department of physics shahid bahonar university of kerman

the study deals with the possibility of information loss in web 2.0 due to the interaction between the overload and the real information. using gottesman and preskill method, this investigation has proposed a mechanism to calculate the amount of information transformation in web 2.0. in this proposal, there are three different hilbert spaces that belong to the degrees of freedom of outside, ins...

متن کامل

Information extraction for enhanced access to disease outbreak reports

Journal: :Journal of biomedical informatics 2002

Ralph Grishman Silja Huttunen Roman Yangarber

Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing...

متن کامل

The XMediaBox: Sensemaking through the Use of Knowledge Lenses

2009

Aba-Sah Dadzie José Iria Daniela Petrelli Lei Xia

Sensemaking is the process of analysing complex situations in order to make informed decisions. Semantic Web technology can be effectively used to create new sensemaking systems that focus on concepts and knowledge instead of documents. We demonstrate how this is achieved using information extraction to acquire knowledge and create a semantic repository that can then be semantically searched. A...

متن کامل

Combining Multiple Sources of Evidence in Web Information Extraction

2008

Martin Labský Vojtech Svátek

Extraction of meaningful content from collections of web pages with unknown structure is a challenging task, which can only be successfully accomplished by exploiting multiple heterogeneous resources. In the Ex information extraction tool, so-called extraction ontologies are used by human designers to specify the domain semantics, to manually provide extraction evidence, as well as to define ex...

متن کامل

Toward Tomorrow’s Semantic Web—An Approach Based on Information Extraction Ontologies

2005

David W. Embley

This position paper proffers the use of information-extraction ontologies as an approach to semantic understanding for the semantic web. From this perspective, it also issues challenges to the machine learning community to offer solutions for specific problems to aid in semantic understanding.

متن کامل

Discovering Company Descriptions on the Web by Multiway Analysis

2003

Vojtech Svátek Petr Berka Martin Kavalec Jirí Kosek Vladimír Vávra

We investigate the possibility of web information discovery and extraction by means of a modular architecture analysing separately the multiple forms of information presentation, such as free text, structured text, URLs and hyperlinks, by independent knowledge-based modules. First experiments in discovering a relatively easy target, general company descriptions, suggests that web information ca...

متن کامل

Meta-learning beyond classification: A framework for information extraction from the Web

2003

Georgios Sigletos Georgios Paliouras Constantine D. Spyropoulos Takis Stamatopoulos

This paper proposes a meta-learning framework in the context of information extraction from the Web. The proposed framework relies on learning a meta-level classifier, based on the output of base-level information extraction systems. Such systems are typically trained to recognize relevant information within documents, i.e., streams of lexical units, which differs significantly from the task of...

متن کامل

Building Intelligent Systems for Mining Information Extraction Rules from Web Pages by Using Domain Knowledge

2001

Heekyoung Seo Jaeyoung Yang Joongmin Choi

Previous researches on automatic information extraction experienced difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a meth...

متن کامل

Trinity: Unsupervised Web Data Extraction Using Ternary Trees

2015

Nitin Shivale

ARTICLE INFO Internet presents a huge collection of useful information so extracting information from web document has become research area for which web data extractors are used. This technique works on two or more web documents generated by same sever side template and learns a regular expression that models it and then used it for extracting data from similar documents. The technique introdu...

متن کامل