Boyer-Moore String Search

With non-periodic search patterns, the Boyer-Moore algorithm's time complexity for finding all matches is linear in m+n, where m and n are the lengths of the pattern and text, respectively. Worst case performance on periodic patterns is quadratic. Charras and Lecroq's Exact String Matching Algorithms discusses variants with linear time complexity even for periodic patterns.

DNA can be considered a string of characters taken from the four letter alphabet of nucleotides: A (Adenine), C (Cytosine), G (Guanine), and T (Thymine). The Boyer-Moore algorithm's efficiency suffers when used with such a small alphabet. Sustik and Moore developed a variation that gives very good performance on small alphabets, uses a small table, and avoids state machines.

Valid CSS 2.1!