Sublinear Approximate String Matching
نویسنده
چکیده
The present paper deals with the subject of approximate string matching and demonstrates how Chang and Lawler [CL94] conceived a new sublinear time algorithm out of ideas that had previously been known. The problem is to find all locations in a text of length n over a b-letter alphabet where a pattern of length m occurs with up to k differences (substitutions, insertions, deletions). The algorithm will run in O( n m k logbm) time when the text is random and k is bounded by the threshold m/(logbm+O(1)). In particular, when k = o(m/ logbm) the expected running time is o(n).
منابع مشابه
Approximate Pattern Matching with Samples
We simplify in this paper the algorithm by Chang and Lawler for the approximate string matching problem, by adopting the concept of sampling. We have a more general analysis of expected time with the simpli ed algorithm for the one-dimensional case under a non-uniform probability distribution, and we show that our method can easily be generalized to the two-dimensional approximate pattern match...
متن کاملApproximate String Matching using Backtracking over Suffix Arrays
We describe a simple backtracking algorithm that finds approximate matches of a pattern in a large indexed text. This algorithm theoretically takes sublinear time in the length of the text. We prove a lemma that helps us to prune a significant number of branches of search in practice. We show an implementation of a variant of this algorithm and that is used to find similar regions between seque...
متن کاملImproved Two-Way Bit-parallel Search
New bit-parallel algorithms for exact and approximate string matching are introduced. TSO is a two-way Shift-Or algorithm, TSA is a two-way Shift-And algorithm, and TSAdd is a two-way Shift-Add algorithm. Tuned Shift-Add is a minimalist improvement to the original Shift-Add algorithm. TSO and TSA are for exact string matching, while TSAdd and tuned Shift-Add are for approximate string matching ...
متن کاملApproximate String Matching with Ordered q-Grams
Approximate string matching with k differences is considered. Filtration of the text is a widely adopted technique to reduce the text area processed by dynamic programming. We present sublinear filtration algorithms based on the locations of q-grams in the pattern. Samples of q-grams are drawn from the text at fixed periods, and only if consecutive samples appear in the pattern approximately in...
متن کاملAll - Against - All Sequence
In this paper we present an algorithm which attempts to align pairs of subsequences from a database of DNA sequences. The algorithm simulates the classical dynamic programming alignment algorithm over a digital index of the database. The running time of the algorithm is subquadratic on average with respect to the database size. A similar algorithm solves the approximate string matching problem ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004