Smith Algorithm. Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice &amp; Experience 21(10), 1991, pp. 1065-1074. . Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University. Problem Definition.

Problem Definition

Input: a text string T with length n and a pattern string P with length m.

Output: all occurrences of P in T.

Definition
• Ts: the first character of a string T aligns to a pattern P.
• Pl : the first character of a pattern P aligns to a string T.
• Tj : the character of the jth position of a string T.
• Pi : the character of the ith position of a pattern P.
• Pf : the last character of a pattern P.
• n :The length of T.
• m : The length of P.
Rule 2-2: 1-Suffix Rule (A Special Version of Rule 2)
• Consider the 1-suffix x. We may apply Rule 2-2 now.
Introduction
• takes the maximum of the Horspool shift function and the Quick Search shift function.
• uses Rule 2-2: 1-Suffix Rule
Smith Algorithm
• This algorithm is almost the same as Quick Search Algorithm except the last character of the window is also considered.

If this will induce a better movement than the Quick Search Algorithm. This is used; otherwise the Quick Search is used.

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

mismatch

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

hpBC[A]=1, qsBC[G]=1, shift=1

mismatch

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

mismatch

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

hpBC[G]=2, qsBC[A]=2, shift=2

mismatch

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

exact match

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

hpBC[G]=2, qsBC[T]=8, shift=8

exact match

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

mismatch

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

hpBC[T]=7, qsBC[A]=2, shift=7

mismatch

Example
• Text string T=GCGCAGAGAGTAGAGAGTACG
• Pattern string

P=CAGAGAG

Time complexity
• preprocessing phase in O(m+ σ) time and O(σ) space complexity, σ is the number of alphabets in pattern.
• searching phase in O(mn) time complexity.
