Populating the Semantic Web

نویسندگان

  • Kristina Lerman
  • Cenk Gazen
  • Steven Minton
  • Craig Knoblock
چکیده

The vision of the Semantic Web is that a vast store of online information “meaningful to computers will unleash a revolution of new possibilities”. Unfortunately, the vast majority of information on the Web is formatted to be easily read by human users, not computer applications. In order to make the vision of the Semantic Web a reality, tools for automatically annotating Web content with semantic labels will be required. We describe the ADEL system that automatically extracts records from Web sites and semantically labels the fields. The system exploits similarities in the layout of Web pages in order to learn the grammar that generated these pages. It them uses this grammar to extract structured records from these Web pages. ADEL system also exploits the fact that sites in the same domain will provide the same, or similar data. By collecting labeled examples of data during the training stage, we are able to learn structural descriptions of data fields and later use these descriptions to semantically label new data fields. We show that on a Used Car shopping domain, ADEL achieves precision of 64% and recall of 89% on extracting and labeling data columns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Semantic Portal for Fund Finding in the EU: Semantic Upgrade, Integration and Publication of Heterogeneous Legacy Data

FundFinder is a Semantic Web portal that allows searching for and navigating through information about funding opportunities. This application has been created following a set of techniques and using a set of tools for the upgrade of legacy content to the Semantic Web, including databases and semistructured documents. This process consists in extracting and populating knowledge from heterogeneo...

متن کامل

OntoMiner: Bootstrapping and Populating Ontologies from Domain Specific Web Sites

HTML documents, which are designed primarily for human consumption. The presence of such legacy documents makes embracing the Semantic Web vision difficult.2 Thus, we need scalable solutions to automatically transform legacy HTML to Semantic Web documents. Recent work describes algorithms that automatically annotate HTML documents with semantic labels.3 Unfortunately, constructing the domain on...

متن کامل

Populating Knowledge Based Decision Support Systems

Knowledge-based decision support systems (KBDSS) hold up business and organizational decisionmaking activities on the basis of the knowledge available concerning the domain under question. One of the main problems with knowledge bases is that their construction is a time-consuming task. A number of methodologies have been proposed in the context of the Semantic Web to assist in the development ...

متن کامل

Semantically Mapping the Web

The millions of web pages populating the internet seem to be unstructured and chaotic, but there are implicit semantic relations between them. In this paper we propose to make explicit the underlying semantic structure of the internet, by measuring joint keyword occurrences in web pages, around our notion of “Semantic Contexts”. As a result, we can draw a “map” of semantic clusters which can be...

متن کامل

AHP Techniques for Trust Evaluation in Semantic Web

The increasing reliance on information gathered from the web and other internet technologies raise the issue of trust. Through the development of semantic Web, One major difficulty is that, by its very nature, the semantic web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each resource. Each user knows the trustworthiness of ...

متن کامل

A Framework for Populating Ontological Models from Semi-structured Web Documents

TheWeb is the largest repository of information that has ever existed. This information is presented in a human friendly format using HTML, which complicates the consumption of this information by automatic processes. Solutions to this problem are the Semantic Web and Web Services, but the lack of such services in the majority of web sites has increased the interest on information extraction wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004