Adaptive Approximate Record Matching

author

  • Ramin Rahnamoun Computer Engineering Department, Azad University-Tehran Central Branch, Tehran, Iran.
Abstract:

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error patterns. In field matching phase, edit distance method is used. Naturally, it had been customized for Persian language problems such as similarity of Persian characters, usual typographical errors in Persian, etc. In record matching phase, the importance of each field can be determined by specifying a coefficient related to each field. Coefficient of each field must be dynamically changed, because of changes of typographical error patterns. For this reason, Genetic Algorithm (GA) is used for supervised learning of coefficient values. The simulation results show the high abilities of this algorithm compared with other methods (such as Decision Trees).

Download for Free

Sign up for free to access the full text

Already have an account?login

similar resources

adaptive approximate record matching

typographical data entry errors and incomplete documents, produce imperfect records in real world databases. these errors generate distinct records which belong to the same entity. the aim of approximate record matching is to find multiple records which belong to an entity. in this paper, an algorithm for approximate record matching is proposed that can be adapted automatically with input error...

full text

Automating the approximate record-matching process

Data Quality has many dimensions one of which is accuracy. Accuracy is usually compromised by errors accidentally or intensionally introduced in a database system. These errors result in inconsistent, incomplete, or erroneous data elements. For example, a small variation in the representation of a data object, produces a unique instantiation of the object being represented. In order to improve ...

full text

Random databases with approximate record matching

In many database applications in telecommunication, environmental and health sciences, bioinformatics, physics, and econometrics, real-world data are uncertain and subjected to errors. These data are processed, transmitted and stored in large databases. We consider stochastic modelling for databases with uncertain data and for some basic database operations (for example, join, selection) with e...

full text

CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Practice - Oriented )

We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a matching algor...

full text

CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Complete Paper )

We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a machine-learni...

full text

Record Matching in Digital

When data stores grow large, data quality, cleaning, and integrity become issues. The commercial sector spends a massive amount of time and energy canonicalizing customer and product records as their lists of products and consumers expand. An Accenture study in 2006 found that a high-tech equipment manufacturer saved $6 million per year by removing redundant customer records used in customer ma...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 03  issue 01

pages  23- 27

publication date 2014-10-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023