CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Complete Paper )

نویسندگان

  • Martin Buechi
  • Andrew Borthwick
  • Adam Winkel
  • Arthur Goldberg
چکیده

We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a machine-learning technique to compute a match probability. ClueMaker is based on Java and is compiled to Java source or byte code. Therefore, ClueMaker is easily accessible to many programmers, allows the integration of any Java class, runs on virtually any platform, supports UNICODE, and is more easily accepted by IT departments who try to minimize the number of distinct languages in use. ChoiceMaker Technologies has used ClueMaker successfully over the past two years in a variety of approximate record matching tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Practice - Oriented )

We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a matching algor...

متن کامل

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

The ChoiceMaker 2 Record Matching System

This paper describes the key features of an innovative record matching system called ChoiceMaker 2 developed by ChoiceMaker Technologies (CMT). We begin with an overview of the stages that a record matching system goes through to find an incoming “query record” in a database. We then consider the stages one by one: We sketch out our patent-pending process for identifying possible matches to the...

متن کامل

Complete pivoting strategy for the $IUL$ preconditioner obtained from Backward Factored APproximate INVerse process

‎In this paper‎, ‎we use a complete pivoting strategy to compute the IUL preconditioner obtained as the by-product of the Backward Factored APproximate INVerse process‎. ‎This pivoting is based on the complete pivoting strategy of the Backward IJK version of Gaussian Elimination process‎. ‎There is a parameter $alpha$ to control the complete pivoting process‎. ‎We have studied the effect of dif...

متن کامل

Partial Matchmaking using Approximate Subsumption

Description Logics, and in particular the web ontology language OWL has been proposed as an appropriate basis for computing matches between structured objects for the sake of information integration and service discovery. A drawback of the direct use of subsumption as a matching criterion is the inability to compute partial matches and qualify the degree of mismatch. In this paper, we describe ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003