Building intelligent Web applications using lightweight wrappers

نویسندگان

  • Arnaud Sahuguet
  • Fabien Azavant
چکیده

The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that o ers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to various data formats like XML; (3) some visual tools to make the engineering of wrappers faster and easier.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reconfigurable Web Wrapper Agents

directly access the data. Web wrappers, however, must automate Web browsing sessions to extract data from the target Web pages so other applications can process that data. Each Web site has its own set of links, layout templates, and syntax. You could, in a brute-force solution, program a wrapper for each browsing session. However, such wrappers are sensitive to Web site changes and become diff...

متن کامل

First International Workshop on Lightweight Integration on the Web (ComposableWeb’09)

Automatic navigating and gathering information from Deep Web sites requires the use of Web Wrappers in order to simulate human interaction with Web sites. Web Wrappers have some drawbacks: their implementations are specific to the accessed site and also their source code needs a constant maintenance in order to support new changes on Web site. In this work we propose an annotation model for Dee...

متن کامل

Between Now and the Semantic Web

The Information Source Adapter Platform (or ISA Platform, for short) is a set of practical enabling technologies for developing intelligent information systems. The platform provides an infrastructure for building lightweight interfaces to existing resources that can be composed automatically using means-ends analysis. The ISA Platform enables developers to rapidly create new and innovative app...

متن کامل

Web Wrapper Specification Using Compound Filter Learning

Information available on the Internet is made to be read by humans, not to be processed by machines. To automatically access this information, there is a need for intelligent services that convert HTML documents into more suitable formats like XML. This can be achieved through generation of Web wrappers, programs designed to process pages of a given Web site. To generate such Web wrappers, an e...

متن کامل

Building Intelligent Systems for Mining Information Extraction Rules from Web Pages by Using Domain Knowledge

Previous researches on automatic information extraction experienced difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Data Knowl. Eng.

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2001