Computing the edit distance of a regular language

نویسنده

  • Stavros Konstantinidis
چکیده

The edit distance (or Levenshtein distance) between two words is the smallest number of substitutions, insertions, and deletions of symbols that can be used to transform one of the words into the other. In this paper we consider the problem of computing the edit distance of a regular language (also known as constraint system), that is, the set of words accepted by a given finite automaton. This quantity is the smallest edit distance between any pair of distinct words of the language. We show that the problem is of polynomial time complexity. We distinguish two cases depending on whether the given automaton is deterministic or nondeterministic. In the latter case the time complexity is higher. Incidentally, we also obtain an upper bound on the edit distance of a regular language in terms of the automaton accepting the language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An efficient algorithm for computing the edit distance of a regular language via input-altering transducers

We revisit the problem of computing the edit distance of a regular language given via an NFA. This problem relates to the inherent maximal error-detecting capability of the language in question. We present an efficient algorithm for solving this problem which executes in time O(rnd), where r is the cardinality of the alphabet involved, n is the number of transitions in the given NFA, and d is t...

متن کامل

Edit Distance for Pushdown Automata

The edit distance between two words w1, w2 is the minimal number of word operations (letter insertions, deletions, and substitutions) necessary to transform w1 to w2. The edit distance generalizes to languages L1,L2, where the edit distance is the minimal number k such that for every word from L1 there exists a word in L2 with edit distance at most k. We study the edit distance computation prob...

متن کامل

Top-Down Tree Edit-Distance of Regular Tree Languages

We study the edit-distance of regular tree languages. The edit-distance is a metric for measuring the similarity or dissimilarity between two objects, and a regular tree language is a set of trees accepted by a finite-state tree automaton or described by a regular tree grammar. Given two regular tree languages L and R, we define the editdistance d(L,R) between L and R to be the minimum edit-dis...

متن کامل

Prefix Distance Between Regular Languages

The prefix distance between two words x and y is defined as the number of symbol occurrences in the words that do not belong to the longest common prefix of x and y. We show how to model the prefix distance using weighted transducers. We use the weighted transducers to compute the prefix distance between two regular languages by a transducer-based approach originally used by Mohri for an algori...

متن کامل

Property and Equivalence Testing on Strings

We investigate property testing and related questions, where instead of the usual Hamming and edit distances between input strings, we consider the more relaxed edit distance with moves. Using a statistical embedding of words which has similarities with the Parikh mapping, we first construct a tolerant tester for the equality of two words, whose complexity is independent of the string size, and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Comput.

دوره 205  شماره 

صفحات  -

تاریخ انتشار 2007