Algorithms on Strings

In this section, we restrict the approximate pattern matching to the search for all the occurrences of a string x of length m in a string y of length n with at most k mismatches ( k ? N, k < m ? n). We recall from Chapter 7 that the Hamming distance between two strings u and ? of the same length is the number of mismatches between u and ?, and is defined by
The problem can then be expressed as the search for all the positions j = 0 , 1 , , n ? m on y that satisfy the inequality Ham( x, y[ j .. j + m ? 1]) ? k.
A natural solution to this problem consists in using an automaton that recognizes the language A*{ w : Ham( x, w) ? k}. This extends the method developed in Chapter 2. To do this, we can consider the nondeterministic automaton defined as follows:
each state is a pair ( ?, i) where ? is the level of the state and i is its depth, with 0 ? ? ? k, ?1 ? i ? m ? 1, and ? ? i + 1,
the initial state is (0 ,