record matching

CLUEMAKER : A LANGUAGE FOR APPROXIMATE RECORD MATCHING ( Practice - Oriented )

2003

Martin Buechi Andrew Borthwick Adam Winkel Arthur Goldberg

We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a matching algor...

متن کامل

Query Matching in a BitTorrent-Based P2P Database System

2010

John Colquhoun

In our previous work ,we introduced the Wigan Peer-to-Peer database server, which is based on the popular BitTorrent file-sharing protocol. In Wigan, users (peers) cache the results of queries they receive and make these available to future users. A central component, known as the Tracker, keeps a record of which users have submitted which queries and uses this record to provide a new user subm...

متن کامل

Hierarchical Bayesian Record Linkage Theory

2005

Michael D. Larsen

In record linkage, or exact file matching, one compares two or more files on a single population for purposes of unduplication or production of an enhanced, merged database. Record linkage has many applications, including in population enumeration efforts, to create databases for epidemiological investigations, and to improve survey sample frames. Latent class and mixture models have been used ...

متن کامل

Equal percent bias reduction and variance proportionate modifying properties with mean–covariance preserving matching

2012

Yannis G. Yatracos Y. G. Yatracos

Mean-preserving and covariance preservingmatchings are introduced that can be obtained with conditional, randomized matching on sub-populations of a large control group. Under moment conditions it is shown that these matchings are, respectively, equal percent bias reducing (EPBR) and variance proportionatemodifying (PM) for linear functions of the covariates and their standardizations. The resu...

متن کامل

A Decision Tree Based Record Linkage for Recommendation Systems

2015

MS. N. S. Sheth A. R. Deshpande

Record linkage merges all the records relating to the same entity from multiple datasets, at the entity level. It is the initial data preparation phase for most of the database projects. Traditionally one to one data linkage is performed among the entities of same type with common unique identifier. The proposed one to many and/or many to many record linkage method is able to link the entities ...

متن کامل

A Survey of Probabilistic Record Matching Models, Techniques and Tools

2008

Federico Maggi

Probabilistic record linkage regards the use of stochastic decision models to solve the problem of record linkage (also known as record matching). Data quality has became a key aspect in many institutions and the demand for novel, effective techniques is increasing. Record linkage in general has been studied in the last three decades and a solid probabilistic decision framework has been propose...

متن کامل

Improving EM Algorithm Estimates for Record Linkage Parameters

2002

William E. Yancey

The EM algorithm can be used to estimate conditional probabilities for matching field patterns for the Fellegi-Sunter model for record linkage. The algorithm is based on a latent class model for the record pairs where one of the classes is the set of true matches. If the number of true match pairs in the data set is too small, then the EM algorithm cannot detect the correct latent class. We con...

متن کامل

Improving Probabilistic Record Linkage Using Statistical Prediction Models

Journal: :International Statistical Review 2022

Summary Record linkage brings together information from records in two or more data sources that are believed to belong the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations challenge is link units subject error. We provide an overview record techniques specifically investigate classic Fellegi Sunter probabilist...

متن کامل

A SNOMED supported ontological vector model for subclinical disorder detection using EHR similarity

Journal: :Eng. Appl. of AI 2011

Lawrence Wing-Chi Chan Y. Liu Chi-Ren Shyu Iris F. F. Benzie

Electronic Health Records (EHR) form a valuable resource in the healthcare enterprise because clinical evidence can be provided to identify potential complications and support decisions on early intervention. Simple string matching, the common search algorithm, is not able to map a query to the similar health records in the database with respect to the medical concepts. A novel ontological vect...

متن کامل

Privacy Preserving Record Matching Using Automated Semi-trusted Broker

2015

Ibrahim Lazrig Tarik Moataz Indrajit Ray Indrakshi Ray Toan Ong Michael G. Kahn Frédéric Cuppens Nora Cuppens-Boulahia

In this paper, we present a novel scheme that allows multiple data publishers that continuously generate new data and periodically update existing data, to share sensitive individual records with multiple data subscribers while protecting the privacy of their clients. An example of such sharing is that of health care providers sharing patients’ records with clinical researchers. Traditionally, ...

متن کامل