A UML Based Approach for Modeling ETL Processes in Data Warehouses
نویسندگان
چکیده
Data warehouses (DWs) are complex computer systems whose main goal is to facilitate the decision making process of knowledge workers. ETL (Extraction-Transformation-Loading) processes are responsible for the extraction of data from heterogeneous operational data sources, their transformation (conversion, cleaning, normalization, etc.) and their loading into DWs. ETL processes are a key component of DWs because incorrect or misleading data will produce wrong business decisions, and therefore, a correct design of these processes at early stages of a DW project is absolutely necessary to improve data quality. However, not much research has dealt with the modeling of ETL processes. In this paper, we present our approach, based on the Unified Modeling Language (UML), which allows us to accomplish the conceptual modeling of these ETL processes together with the conceptual schema of the target DW in an integrated manner. We provide the necessary mechanisms for an easy and quick specification of the common operations defined in these ETL processes such as, the integration of different data sources, the transformation between source and target attributes, the generation of surrogate keys and so on. Moreover, our approach allows the designer a comprehensive tracking and documentation of entire ETL processes, which enormously facilitates the maintenance of these processes. Another advantage of our proposal is the use of the UML (standardization, ease-of-use and functionality) and the seamless integration of the design of the ETL processes with the DW conceptual schema. Finally, we show how to use our integrated approach by using a well-known modeling tool such as Rational Rose.
منابع مشابه
Applying UML for Modeling the Physical Design of Data Warehouses
In previous work, we have shown how to use unified modeling language (UML) as the primary representation mechanism to model conceptual design, logical design, modeling of extraction, transformation, loading (ETL) processes, and defining online analytical processing (OLAP) requirements of data warehouses (DW). Continuing our work on using UML throughout the DW development lifecycle, in this chap...
متن کاملModeling the physical design of data warehouses from a UML specification
A Data Warehouse (DW) is a complex information system mainly used to support strategy decisions. During the last few years, several approaches have been proposed to model different aspects of a DW. However, few efforts have been dedicated to the modeling of the physical design (i.e. the physical structures that will host data together with their corresponding implementations) of a DW from the e...
متن کاملA visual language-based system for extraction-transformation-loading development
Data warehouse loading and refreshment is typically performed by means of complex software processes called extraction–transformation–loading (ETL). In this paper, we propose a system based on a suite of visual languages for mastering several aspects of the ETL development process, turning it into a visual programming task. The approach can be easily generalized and applied to other data integr...
متن کاملPhysical Modeling of Data Warehouses Using UML Component and Deployment Diagrams: Design and Implementation Issues
Several approaches have been proposed to model different aspects of a Data Warehouse (DW) during recent years, such as the modeling of a DW at the conceptual and logical level, the design of the ETL (Extraction, Transformation, Loading) processes, the derivation of the DW models from the enterprise data models, and customization of a DW schema. At the end of the design, a DW has to be deployed ...
متن کاملA Framework for ETL Systems Development
There are many commercial Extract-Transform-Load (ETL) tools, of which most of them do not offer an integrated platform for modeling processes and extending functionality. This drawback complicates the customization and integration with other applications, and consequently, many companies adopt internal development of their ETL systems. A possible solution is to create a framework to provide ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003