نتایج جستجو برای: data cleaning

تعداد نتایج: 2424654  

Journal: :IEEE Data Eng. Bull. 2000
Erhard Rahm Hong Hai Do

We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called ETL process. We also discuss current tool suppo...

2015
Yang Bao Shi Wei Deng Wang Qun Lin

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsiste...

2007
Melanie Herschel Ioana Manolescu

Data cleaning is the process of correcting anomalies in a data source, that may for instance be due to typographical errors, or duplicate representations of an entity. It is a crucial task in customer relationship management, data mining, and data integration. With the growing amount of XML data, approaches to effectively and efficiently clean XML are needed, an issue not addressed by existing ...

Journal: :CoRR 2017
Chang Ge Ihab F. Ilyas Xi He Ashwin Machanavajjhala

Data cleaning is the process of detecting and repairing inaccurate or corrupt records in the data. Data cleaning is inherently human-driven and state of the art systems assume cleaning experts can access the data to tune the cleaning process. However, in sensitive datasets, like electronic medical records, privacy constraints disallow unfettered access to the data. To address this challenge, we...

2008
J. Jebamalar Tamilselvi V. Saravanan

The data cleaning is the process of identifying and removing the errors in the data warehouse. Data cleaning is very important in data mining process. Most of the organizations are in the need of quality data. The quality of the data needs to be improved in the data warehouse before the mining process. The framework available for data cleaning offers the fundamental services for data cleaning s...

2008
Taoxin Peng

It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when perfor...

2011
Louardi BRADJI Mahmoud BOUFAIDA

High quality of data warehouse is a key to make smart strategic decisions. The data cleaning is program that performs to deal with the quality problems of data extracted from operational sources before their loading into data warehouse. As the data cleaning can introduce errors and some data require manually clean, there is a need for an open user involvement in data cleaning for data warehouse...

Journal: :IEEE Data Eng. Bull. 2011
Arvind Arasu Surajit Chaudhuri Zhimin Chen Kris Ganjam Raghav Kaushik Vivek R. Narasayya

We present a domain independent platform for data cleaning developed as part of the Data Cleaning project at Microsoft Research. Our platform consists of a set of core primitives and design tools that allow a programmer to develop sophisticated data cleaning solutions with minimal programming effort. Our primitives are designed to allow rich domain and application specific customizations and ca...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید