1 / 16

Exact Matching

Exact Matching. Charles Yan 2008. Na ï ve Method. Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end of T Compare from left right until mismatch or an occurrence of P is found Shift P one place to the right O (n*m).

saxon
Download Presentation

Exact Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exact Matching Charles Yan 2008

  2. Naïve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end of T Compare from left right until mismatch or an occurrence of P is found Shift P one place to the right O (n*m)

  3. Speeding Up The Naïve Algorithm • Shift P by more than one places at a time • Skip comparisons that have been made

  4. Preprocessing • Goal: To gather the information needed for speeding up the algorithm • Definitions: • substring, prefix, suffix, proper prefix, proper suffix • Zi: For i>1, the length of the longest substring of S that starts at i and matches a prefix of S • Z-box: for any position i >1 where Zi>0, the Z-box at i starts at i and ends at i+Zi-1 • ri; For every i>1, ri is the right-most endpoint of the Z-boxes that begin at or before i • li; For every i>1, li is the left endpoint of the Z-box ends at ri

  5. Preprocessing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 S: a a b a a b c a x a a b a a b c y Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0 Z-box a a b a a b c a x a a b a a b c y ri: 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16 li: 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10

  6. Z-Algorithm Goal: To calculate Zi for an input string S in a linear time Starting from i=2, calculate Z2, r2 and l2 For i=3; i<n; i++ In iteration k, calculate Zk, rk and lk based on Zj,rjand lj forj=2,…,k-1 For iteration k, the algorithm only need rk-1 and lk-1. Thus, there is no need to keep all ri and li. We use r, and l to denote rk-1 and lk-1

  7. Z-Algorithm In iteration k: (I) if k<=r l k r a’ a b’ b l’ k’ l k r’ r k’=k-l+1; r’=r-l+1; a=a’; b=b’ a’ a b’ b a a b a a b c a x a a b a a b c y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

  8. A) If |g’|<|b’|, that is, Z k’< r-k+1, Z k = Z k’ a’ a x y b’ y b g’ g g’’ l’ k’ l k r’ r g=g’=g’’; x≠y a’ a b’ g’ g b g’’ a a b a a b c a x a a b a a b c y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Z: 0 1 0 3 1 0 0 1 0 7 1 0 3

  9. Z-Algorithm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 S: a a b a a b c a x a a b a a c d Z: 0 1 0 3 1 0 0 1 0 6 1 0 2 1 0 0 B) If |g’|>|b’|, that is, Z k’ >r-k+1, Zk =|b|, i.e., r-k+1 a’ b’’ b’ b a x x y g’ g g’’ l’ k’ l k r’ r b=b’=b’’ g’=g’’; x ≠y (because a is a Z box) Zk =|b|, i.e., r-k+1 a’ b’’ b’ b a g’’ g’

  10. Z-Algorithm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 S: a a b a a e c a x a a b a a b d Z: 0 1 0 2 1 0 0 1 0 6 1 0 3 1 0 0 C) If |g’|=|b’|, that is, Z k’ =r-k+1, Zk =|b|, i.e., r-k+1 a’ b’’ b’ b a z x y g’ g g’’ l’ k’ l k r’ r b=b’=b’’ g=g’=g’’; x ≠y (because a is a Z box) z ≠x (because g’ is a Z box) z ?? y Compare S[r+1,...] with S[ |b| +1,…] until a mismatch occurs. Update Zk, r, and l a’ b’ b a g’’ g’

  11. Z-Algorithm (II) if k>r l r k Compare the characters starting at k+1 with those starting at 1. Update r, and l if necessary

  12. Z-Algorithm Input: Pattern P Output: Zi Z Algorithm Calculate Z2, r2 and l2 specifically by comparisons. R=r2 and l=l2 for i=3; i<n; i++ if k<=r if Z k-l+1 <r-k+1, then Z k = Z k-l+1 else if Z k-l+1 > r-k+1 Z k = r-k+1 else compare the characters starting at r+1 with those starting at |b| +1. Update r, and l if necessary else Compare the characters starting at k to those starting at 1. Update r, and l if necessary

  13. Preprocessing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 S: a a b a a b c a x a a b a a b c y Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0 r: 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16 l: 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10

  14. Z-Algorithm Time complexity #mismatches <= number of iterations, n #matches • Let q be the number of matches at iteration k, then we need to increase r by at least q • r<=n • Thus total #match <=n T=O( #matches + #mismatches +#iterations)=O(n) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 S: a a b a a b c a x a a b a a b c y Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0 r: 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16 l: 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10 #m: 0 1 0 3 0 0 0 1 0 7 0 0 0 0 0 0 0 #mis: 0 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 1

  15. Simplest Linear Time Exact Matching Algorithm Input: Pattern P, Text T Output: Occurrences of P in T Algorithm Simplest S=P$T, where $ is a character that do not appear in P and T For i=2; i<|S|; i++ Calculate Zi If Zi=|P|, then report that there is an occurrence of P in T starting at i-|P|-1 of T=O(|P|+|T|+1)=O(n+m)

  16. Simplest Linear Time Exact Matching Algorithm • Take only O (n) extra space • Alphabet-independent linear time a’ a b’ b $ l’ k’ l k r’ r

More Related