Integrating Unnormalised Semi-structured Data Sources

نویسندگان

  • Sasivimol Kittivoravitkul
  • Peter McBrien
چکیده

From Proc. CAiSE05 LNCS 3520, Pages 460-474 c ©Springer-Verlag 2005 Semi-structured data sources, such as XML, HTML or CSV files, present special problems when performing data integration. In addition to the hierarchical structure of the semistructured data, the data integration must deal with the redundancy in semi-structured data, where the same fact may be repeated in a data source, but should map into a single fact in a global integrated schema. We term semi-structured data containing such redundancy as being an unnormalised data source, and we define a normal form for semi-structured data that may be used when defining global schemas. We introduce special functions to relate object identifiers used in the global data model to object identifiers in unnormalised data sources, and demonstrate how to use these functions in query processing, update processing and integration of these data sources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating and Querying an Integrated Ontology for Molecular and Phenotypic Cereals Data

In this paper we describe the development of an ontology of molecular and phenotypic cereals data, realized by integrating existing public web databases with the database developed by the research group of the CEREALAB project. This integration is obtained using the MOMIS system (Mediator envirOnment for Multiple Information Sources), a mediator based data integration system developed by the Da...

متن کامل

Integrating Structured Metadata with Relational Affinity Propagation

Structured and semi-structured data describing entities, taxonomies and ontologies appears in many domains. There is a huge interest in integrating structured information from multiple sources; however integrating structured data to infer complex common structures is a difficult task because the integration must aggregate similar structures while avoiding structural inconsistencies that may app...

متن کامل

A Structure - Based Approach to QueryingSemi - Structured

Several researchers have considered integrating multiple un-structured, semi-structured, and structured data sources by modeling all sources as edge labeled graphs. Data in this model is self-describing and dynamically typed, and captures both schema and data information. The labels are arbitrary atomic values, such as strings, integers, reals, etc., and the integrated data graph is stored in a...

متن کامل

A Structure-Based Approach to Querying Semi-Structured Data

Several researchers have considered integrating multiple unstructured, semi-structured, and structured data sources by modeling all sources as edge labeled graphs. Data in this model is self-describing and dynamically typed, and captures both schema and data information. The labels are arbitrary atomic values, such as strings, integers, reals, etc., and the integrated data graph is stored in a ...

متن کامل

Integrating Xml Sources into a Data Warehouse Environment

A data warehousing system is a collection of technologies and tools which enables knowledge workers to acquire, integrate and flexibly analyze information from different sources aimed at improving the knowledge assets of the enterprise. The importance of integrating XML data in data warehousing environments is becoming increasingly high as more organizations view the web as an integral part of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005