1 / 13

String Matching II

String Matching II. Algorithm : Design & Analysis [22]. In the last class …. Simple String Matching KMP Flowchart Construction KMP Scan. String Matching II. Boyer-Moore ’ s heuristics Skipping unnecessary comparison Combining fail match knowledge into jump Boyer-Moore Algorithm

etan
Download Presentation

String Matching II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. String Matching II Algorithm : Design & Analysis [22]

  2. In the last class… • Simple String Matching • KMP Flowchart Construction • KMP Scan

  3. String Matching II • Boyer-Moore’s heuristics • Skipping unnecessary comparison • Combining fail match knowledge into jump • Boyer-Moore Algorithm • Approximate String Matching

  4. Skipping Charactersin String Matching must must must must must must must must must must must must If you wish to understand others you must … Checking the characters in P, in reverse order The copy of the P begins at t38. Matching is achieved in 18 comparisons

  5. Distance of Jumping Forward • With the knowledge of P, the distance of jumping forward for the pointer of T is determined by the character itself, independent of the location in T. p1 … A … A … pm p1 … A … A … pk … pm  t1 …… tj=A …… …… tn new j current j Rightmost ‘A’ charJump[‘A’] = m-k

  6. Computing the Jump: Algorithm Input: Pattern string P; m, the length of P; alphabet size alpha=|| Output: Array charJump, indexed 0,…, alpha-1, storing the jumping offsets for each char in alphabet. void computeJumps(char[ ] P, int m, int alpha, int[ ] charJump char ch; int k; for (ch=0; ch<alpha; ch++) charJump[ch]=m; //For all char no in P, jump by m for (k=1; km; k++) charJump[pk]=m-k; (||+m) The increasing order of k ensure that for duplicating symbols in P, the jump is computed according to the rightmost

  7. Partially Matched Substring matched suffix P: b a t s a n d c a t s T: …… d a t s …… New j Move only 1 char Current j charJump[‘d’]=4 Remember the matched suffix, we can get a better jump P: b a t s a n d c a t s T: …… d a t s …… New j Move 7 chars

  8. Forward to Match the Suffix p1 …… pkpk+1 …… pm Matched suffix  …… Dismatch t1 …… tjtj+1 …… …… tn Substring same as the matched suffix occurs in P p1 …… prpr+1 …… pr+m-k …… pm slide[k] p1 …… pkpk+1 …… pm t1 …… tjtj+1 …… …… tn matchJump[k] New j Old j

  9. Partial Match for the Suffix p1 …… pkpk+1 …… pm Matched suffix  …… Dismatch t1 …… tjtj+1 …… …… tn No entire substring same as the matched suffix occurs in P p1 …… pq …… pm May be empty slide[k] p1 …… pkpk+1 …… pm t1 …… tjtj+1 …… …… tn matchJump[k] New j Old j

  10. p1 …… prpr+1 …… pr+m-k …… pm slide[k] p1 …… pkpk+1 …… pm t1 …… tjtj+1 …… …… tn matchJump[k] New j Old j matchjump and slide • slide[k]: the distance P slides forward after dismatch at pk, with m-k chars matched to the right • matchjump[k]: the distance j, the pointer of P, jumps, that is: • matchjump[k]=slide[k]+m-k • Let r(r<k) be the largest index, such that pr+1 starts a largest substring matching the matched suffix of P, and prpk,then slide[k]=k-r • If the r not found, the longest prefix of P, of length q, matching the matched suffix of P will be lined up. Then slide[k]=m-q.

  11. Computing matchJump: Example Direction of computing P = “ w o w w o w ” w o w w o w w o w w o w matchJump[6]=1  t1 …… tj …… Matched is empty w o w w o w w o w w o w matchJump[5]=3  t1 …… tj w …… Matched is 1

  12. Computing matchJump: Example Direction of computing P = “ w o w w o w ” w o w w o w w o w w o w matchJump[4]=7 Not lined up  t1 …… tj o w …… Matched is 2 w o w w o w w o w w o w matchJump[3]=6  t1 …… tj w o w …… Matched is 3

  13. Computing matchJump: Example Direction of computing P = “ w o w w o w ” w o w w o w slide[2]=3 w o w w o w matchJump[2]=7  t1 …… tj w w o w …… Matched is 4 w o w w o w w o w w o w matchJump[1]=8  t1 …… tj o w w o w …… Matched is 5

More Related