Applications of Approximate Word Matching in Information

نویسندگان

  • James C. French
  • Allison L. Powell
  • Eric Schulman
چکیده

As more online databases are integrated into digital libraries , the issue of quality control of the data becomes increasingly important, especially as it relates to the eeective retrieval of information. The need to discover and reconcile variant forms of strings in bibliographic entries, i.e., authority work, will become more critical in the future. Spelling variants, misspellings, and transliteration diierences will all increase the diiculty of retrieving information. Approximate string matching has traditionally been used to help with this problem. In this paper we introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. 1 Introduction There are increasingly more online databases in the current climate of electronic publishing. The challenge is to integrate them into coherent digital libraries that let users have unimpeded access to accurate information. As the pace of electronic publication accelerates, there will be increasing reliance on automated techniques to aid information providers as they seek to reach this goal. For a number of years there has been an increasing emphasis on data quality in online databases 10]. In this paper we look at techniques to aid in detecting variant forms of strings in bibliographic databases. This is called authority work 2], and results in the creation of authority les that maintain the correspondence between all of the allowable forms for strings in a particular bibliographic eld, for example author or journal name. Another problem arises when bibliographic databases are integrated. The diierent component databases might use diierent authority conventions, and users familiar with one

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applications of Approximate Word Matching in InformationRetrievalJames

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the eeective retrieval of information. The need to discover and reconcile variant forms of strings in bibliographic entries, i.e., authority work, will become more diicult. Spelling variants, misspellings, and transliteration diiere...

متن کامل

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches

This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with kmismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in comp...

متن کامل

Fast Convolutions and Their Applications in Approximate String Matching

We develop a method for performing boolean convolutions efficiently in word RAM model of computation, having a word size of w = Ω(log n) bits, where n is the input size. The technique is applied to approximate string matching under Hamming distance. The obtained algorithms are the fastest known. In particular, we reduce the complexity of the Amir et al. [1] algorithm for k-mismatches from O(n √...

متن کامل

The matching interdiction problem in dendrimers

The purpose of the matching interdiction problem in a weighted graph is to find two vertices such that the weight of the maximum matching in the graph without these vertices is minimized. An approximate solution for this problem has been presented. In this paper, we consider dendrimers as graphs such that the weights of edges are the bond lengths. We obtain the maximum matching in some types of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997