Probabilistic Record Linkage for Genealogical Research

نویسندگان

  • John Lawson
  • Ryan Yamagata
چکیده

The most slow and tedious job in genealogical research is searching civil or church records for information about an individual. But, this is an essential step in research. By searching multiple sources such as census records, wills, deeds, birth and death records we can compile a more complete set of information, and potentially the pedigree of an individual. When records are stored electronically modern methods of probabilistic record linkage can combine or link all the information on an individual from various sources in seconds, rather than requiring days or weeks of arduous searching by a genealogist. Researchers in England, Canada and the U.S. Census Bureau developed the theory for probabilistic record linkage to aid in constructing pedigrees of individuals from vital records, in order to track hereditary diseases. However, probabilistic record linkage has yet to be widely applied to most sources of information used for common genealogical research. This paper is the summary of the results from two Master’s Projects in the Department of Statistics at Brigham Young University. Here, we describe the approach to probabilistic record linkage used by the Family History Department of The Church of Jesus Christ of Latter-day Saints in TempleReady, and demonstrate its application to genealogical research using a set of civil and church records of Quakers in Perquimans and Pasquotank Counties, North Carolina. The results of our study are very promising. Probabilistic record linkage has the potential of dramatically increasing the productivity of genealogical researchers. Although complete automation of genealogical research is a way off, probabilistic record linkage could revolutionize the way research is done. This paper is a report of a work in progress; describing what has been done to the present, and outlining some of the many tasks yet to be addressed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genealogical Record Linkage: Features for Automated Person Matching

This paper provides a high-level overview of how automatic person matching (genealogical record linkage) algorithms can be developed, and then provides a detailed explanation of many of the features used by FamilySearch in doing person matching. Empirical results show a dramatic improvement in accuracy by using these features trained with neural networks, when compared to traditional probabilis...

متن کامل

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

PROBABILISTIC METHODOLOGY FOR RECORD LINKAGE DETERMINING ROBUSTNESS OF WEIGHTS By:

Over time, the world population has developed a desire to research their ancestoral linage. Many resources have been identified to aid an individual in genealogical research. In the United States, one of the greatest resources for researching genealogy is census records. Census records allow a genealogical researcher to track individuals over time, broadening the scope of information one can ac...

متن کامل

Reconstructing historical populations from genealogical data: an overview of methods used for aggregating data from GEDCOM files

The GEDCOM file format is by far the most widely used means of exchanging genealogical data and extensive collections of these files are available online. There is a huge potential benefit for historians and other academics who are able to make use of the data contained in available GEDCOM files, as these effectively represent hundreds of thousands of hours of crowdsourced work and a considerab...

متن کامل

Utilizing Stacking for Feature Reduction in Graph-Based Genealogical Record Linkage

Genealogy research is centered on collecting records about an individual from various sources and combining the information to gain a larger historical perspective about that individual, commonly in the form of a pedigree. Data extraction, the internet, and other technological advancements have made large amounts of digital genealogical data more accessible. Discovering the relevancy of a digit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014