نتایج جستجو برای: web wrapper generation

تعداد نتایج: 567401  

2013
Valter Crescenzi Paolo Merialdo Disheng Qiu

We present solutions based on crowdsourcing platforms to support large-scale production of accurate wrappers around data-intensive websites. Our approach is based on supervised wrapper induction algorithms which demand the burden of generating the training data to the workers of a crowdsourcing platform. Workers are paid for answering simple membership queries chosen by the system. We present t...

2011
Marcel Baláž

System-on-chip is an integrated circuit comprising of numerous functional cores which can be of various types. Testing of such diverse circuit is very complex problem. Test access to digital cores is ensured by core wrapper architectures. The paper presents two novel contributions to core test wrappers: (1) the set of optimization techniques for parallel interface to provide faster test applica...

2006
Julien Carme Michal Ceresna Max Goebel

Information available on the Internet is made to be read by humans, not to be processed by machines. To automatically access this information, there is a need for intelligent services that convert HTML documents into more suitable formats like XML. This can be achieved through generation of Web wrappers, programs designed to process pages of a given Web site. To generate such Web wrappers, an e...

2012
Tim Furche Giovanni Grasso Christian Schallhart Andrew Jon Sellers Antonino Rullo

Web wrappers access databases hidden in the deep web by first interacting with web sites by, e.g., filling forms or clicking buttons, to extract the relevant data from the thus unearthed result pages. Though the (semi-)automatic induction and maintenance of such wrappers has been extensively studied, the efficient execution and optimization of wrappers has seen far less attention. We demonstrat...

2007
Jagoda Walny Denilson Barbosa

A wealth of data on the World Wide Web is hidden behind web form query interfaces and cannot be found through regular search engines. Querying across multiple such sources is a tedious and error-prone process; it involves manually filling in many related, but different, web forms. SemaForm automates this process by correlating web form labels to entries in a domain ontology through the use of a...

Journal: :PVLDB 2015
Disheng Qiu Luciano Barbosa Xin Dong Yanyan Shen Divesh Srivastava

The web is a rich resource of structured data. There has been an increasing interest in using web structured data for many applications such as data integration, web search and question answering. In this paper, we present DEXTER, a system to find product sites on the web, and detect and extract product specifications from them. Since product specifications exist in multiple product sites, our ...

2001
Valentin Razmov Daniel R. Simon

Vulnerabilities in distributed applications are being uncovered and exploited faster than software engineers can patch the security holes. All too often these weaknesses result from implicit assumptions made by an application about its inputs. One approach to defending against their exploitation is to interpose a filter between the input source and the application that verifies that the applica...

Journal: :Knowl.-Based Syst. 2014
Dánel Sánchez Tarragó Chris Cornelis Rafael Bello Francisco Herrera

Web index recommendation systems are designed to help internet users with suggestions for finding relevant information. One way to develop such systems is using the multi-instance learning (MIL) approach: a generalization of the traditional supervised learning where each example is a labeled bag that is composed of unlabeled instances, and the task is to predict the labels of unseen bags. This ...

2000
Wolfgang May Georg Lausen Georges Koehler

The goal of information extraction from the Web is to provide an integrated view on data from autonomous heterogeneous information sources The main problem with current wrap per mediator approaches is that they rely on very di erent formalisms and tools for wrappers and mediators thus leading to an impedance mismatch between the wrapper and mediator level Additionally most approaches nowadays a...

1999
Sebastian Pulkowski

Literature search and delivery in the World Wide Web is a rapidly expanding market. Up to now the search is mostly cost-free. But in the future we expect the appearance of more and more providers charging for their services. The main problems are finding the right provider and extracting the information. In this paper we present a system for intelligent information search and extraction from mu...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید