CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Complete Paper )
نویسندگان
چکیده
We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a machine-learning technique to compute a match probability. ClueMaker is based on Java and is compiled to Java source or byte code. Therefore, ClueMaker is easily accessible to many programmers, allows the integration of any Java class, runs on virtually any platform, supports UNICODE, and is more easily accepted by IT departments who try to minimize the number of distinct languages in use. ChoiceMaker Technologies has used ClueMaker successfully over the past two years in a variety of approximate record matching tasks.
منابع مشابه
CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Practice - Oriented )
We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a matching algor...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کاملThe ChoiceMaker 2 Record Matching System
This paper describes the key features of an innovative record matching system called ChoiceMaker 2 developed by ChoiceMaker Technologies (CMT). We begin with an overview of the stages that a record matching system goes through to find an incoming “query record” in a database. We then consider the stages one by one: We sketch out our patent-pending process for identifying possible matches to the...
متن کاملComplete pivoting strategy for the $IUL$ preconditioner obtained from Backward Factored APproximate INVerse process
In this paper, we use a complete pivoting strategy to compute the IUL preconditioner obtained as the by-product of the Backward Factored APproximate INVerse process. This pivoting is based on the complete pivoting strategy of the Backward IJK version of Gaussian Elimination process. There is a parameter $alpha$ to control the complete pivoting process. We have studied the effect of dif...
متن کاملPartial Matchmaking using Approximate Subsumption
Description Logics, and in particular the web ontology language OWL has been proposed as an appropriate basis for computing matches between structured objects for the sake of information integration and service discovery. A drawback of the direct use of subsumption as a matching criterion is the inability to compute partial matches and qualify the degree of mismatch. In this paper, we describe ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003