From Algorithms on Strings

In this chapter, we address the problem of searching for a pattern in a text when the pattern represents a finite set of strings. We present solutions based on the utilization of automata. Note first that the utilization of an automaton as solution of the problem is quite natural: given a finite language X ? A*, locating all the occurrences of strings belonging to X in a text y ? A* amounts to determine all the prefixes of y that ends with a string of X ; this amounts to recognize the language A* X ; and as A* X is a regular language, this can be realized by an automaton. We additionally note that such solutions particularly suit to cases where a pattern has to be located in data that have to be processed in an online way: data flow analysis, downloading, virus detection, etc.

The utilization of an automaton for locating a pattern has already been discussed in Section 1.5. We complete here the subject by specifying how to obtain the deterministic automata mentioned at the beginning of this section. Complexities of the methods exposed at the end of Section 1.5 and that are valid for nondeterministic automata are also compared with those presented in this chapter.

The plan is decomposed as follows. We exhibit a type of deterministic and complete automata recognizing the language A* X. We consider two reduced implementations of this type of automata.

Copyright Maxime Crochemore, Christophe Hancart, Thierry Lecroq 2007 under license agreement with Books24x7

Products & Services
Data Mining Software
Data mining software is used to sort large amounts of data and identify or mine relevant information. Applications use advanced search capabilities and statistical algorithms to identify patterns and correlations in a large database, data warehouse, or corpus.
Voice Recognition Software
Voice recognition software is designed to recognize spoken words and convert them into digital data, computer commands, or text.
Valve Boxes and Curb Boxes
Valve boxes and curb boxes are underground devices that protect valves within water, wastewater and gas utility lines. They also permit access for valve operation.
Web Browsers
Web browsers are software applications that allow users interact with objects (e.g. text, images, videos) by retrieving, presenting, and traversing the information on a web page downloaded from the World Wide Web or a local area network.
Bar Graph Arrays
Bar graph arrays are light emitting diodes (LEDs) that are used for solid state metering in applications such as audio monitoring, telecommunications, and instrumentation.

Topics of Interest

In this chapter, we consider the problem of searching for all the occurrences of a fixed string in a text. The methods described here are based on combinatorial properties. They apply when the string...

8.3 Approximate Pattern Matching with Mismatches In this section, we restrict the approximate pattern matching to the search for all the occurrences of a string x of length m in a string y of length...

Overview In this chapter, we present data structures for storing the suffixes of a text. These structures are conceived for providing a direct and fast access to the factors of the text. They allow...

Overview In this chapter, we are interested in the approximate search for fixed strings. Several notions of approximation on strings are considered: jokers, differences, and mismatches. A joker is...

8.2 Approximate Pattern Matching with Differences In this section, we consider the approximate pattern matching with differences: locating all the factors of y that are at a given maximal distance k...

Product Announcements
Visualization Sciences Group, Inc. (VSG)
Infolytica Corporation
CST - Computer Simulation Technology