نتایج جستجو برای: data cleaning

تعداد نتایج: 2424654  

Journal: :PVLDB 2015
Nataliya Prokoshyna Jaroslaw Szlichta Fei Chiang Renée J. Miller Divesh Srivastava

Quantitative data cleaning relies on the use of statistical methods to identify and repair data quality problems while logical data cleaning tackles the same problems using various forms of logical reasoning over declarative dependencies. Each of these approaches has its strengths: the logical approach is able to capture subtle data quality problems using sophisticated dependencies, while the q...

Journal: :IJIQ 2007
Katherine G. Herbert Jason Tsong-Li Wang

As databases become more pervasive through the biological sciences, various data quality concerns are emerging. Biological databases tend to develop data quality issues regarding data legacy, data uniformity and data duplication. Due to the nature of this data, each of these problems is non-trivial and can cause many problems for the database. For biological data to be corrected and standardise...

2001
Helena Galhardas Daniela Florescu Dennis Shasha Eric Simon Cristian-Augustin Saita

2009
Andrea Esuli Fabrizio Sebastiani

In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain; strategies are thus needed for maximizing the effectiveness of the resulting classifiers while minimizing the required amount of training effort. Training data cleaning (TDC) consists in devising ranking functions that sort the original training examples in terms of how...

Journal: :PVLDB 2008
Reynold Cheng Jinchuan Chen Xike Xie

Uncertain or imprecise data are pervasive in applications like location-based services, sensor monitoring, and data collection and integration. For these applications, probabilistic databases can be used to store uncertain data, and querying facilities are provided to yield answers with statistical confidence. Given that a limited amount of resources is available to “clean” the database (e.g., ...

2001
Vijayshankar Raman Joseph M. Hellerstein

Cleaning data of errors in structure and content is important for data warehousing and integration. Current solutions for data cleaning involve many iterations of data “auditing” to find errors, and long-running transformations to fix them. Users need to endure long waits, and often write complex transformation scripts. We present Potter’s Wheel, an interactive data cleaning system that tightly...

Journal: :PVLDB 2013
Amr Ebaid Ahmed K. Elmagarmid Ihab F. Ilyas Mourad Ouzzani Jorge-Arnulfo Quiané-Ruiz Nan Tang Si Yin

We present NADEEF, an extensible, generic and easy-todeploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with the data and (possibly) how to fix it. ...

Journal: :JACIII 2010
Piyasak Jeatrakul Kevin Kok Wai Wong Lance Chun Che Fung

In most classification problems, sometimes in order to achieve better results, data cleaning is used as a preprocessing technique. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data. This should enable the use of a better and representative data set to develop a reliable classification model. In most classification models, unclean data could somet...

Journal: :Journal of Digital Convergence 2014

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید