1 / 25

Searching a String with the Boyer-Moore Algorithm

yxamplegsreinfkaeijajkja;lijEnknfejienanfhytoirht08to43508gjsfnbgfwurhqqjwnsjdlhfjsng83uu5hfaw09854w09ruwij0w9ut94u5t943543r01355738989002211esacbnmasdfghjklq3wwrtyiuiopun4n5ns4e2232tg7msgism8k942uq2nac368723245gm3mjjwihwhrhwqnqn. Searching a String with the Boyer-Moore Algorithm.

avent
Download Presentation

Searching a String with the Boyer-Moore Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. yxamplegsreinfkaeijajkja;lijEnknfejienanfhytoirht08to43508gjsfnbgfwurhqqjwnsjdlhfjsng83uu5hfaw09854w09ruwij0w9ut94u5t943543r01355738989002211esacbnmasdfghjklq3wwrtyiuiopun4n5ns4e2232tg7msgism8k942uq2nac368723245gm3mjjwihwhrhwqnqnyxamplegsreinfkaeijajkja;lijEnknfejienanfhytoirht08to43508gjsfnbgfwurhqqjwnsjdlhfjsng83uu5hfaw09854w09ruwij0w9ut94u5t943543r01355738989002211esacbnmasdfghjklq3wwrtyiuiopun4n5ns4e2232tg7msgism8k942uq2nac368723245gm3mjjwihwhrhwqnqn Searching a String with the Boyer-Moore Algorithm Shana Rose Negin December 14, 2000

  2. Boyer-Moore String Search • How does it work? • Examples • Complexity • Acknowledgements

  3. How Does it Work? • Pattern moves left to right. • Comparisons are done right to left. • Uses two heuristics: • Bad Character • Good Suffix • Each heuristic is put into play when a mismatch occurs. They give us the maximum number of characters the search pattern can move forward safely and still know that there are no characters that need to be checked.

  4. Pattern Moves Left to Right Text: Several hours later, Cindy Pattern: indy Text: Several hours_later, Cindy Pattern: indy Text: Several hours later, Cindy Pattern: indy Start Middle End

  5. Comparisons are done right to left. First Comparison Text: Several hours_later, Cindy Pattern: indy Text: Several hours_later, Cindy Pattern: indy Text: Several hours_later, Cindy Pattern: indy Text: Several hours_later, Cindy Pattern: indy Second Comparison Third Comparison Fourth Comparison

  6. Three Parts to the Bad Character Heuristic 1. When the comparison gives a mismatch, the bad-character heuristic proposes moving the pattern to the right by an amount so that the bad character from the string will match the rightmost occurrence of the bad character in the pattern. 2. If the bad character doesn’t occur in the pattern, then the pattern may be moved completely past the bad character. 3. If the rightmost occurrence of the bad character is to the right of the current bad character position, then this heuristic makes no proposal.

  7. Bad Character Heuristic 1. When the comparison gives a mismatch, the bad-character heuristic proposes moving the pattern to the right by an amount so that the bad character from the string will match the rightmost occurrence of the bad character in the pattern. Text: You’ve got a funny face, man. Pattern: cite Text: You’ve got a funny face,_man. Shift: cite Shifted two characters to match up the c’s.

  8. Bad Character Heuristic 2. If the bad character doesn’t occur in the pattern, then the pattern may be moved completely past the bad character. Text: You’ve got a funny face, man. Pattern: poor Text: You’ve got a funny face, man. Shift: poor Shifted four characters because there was no match.

  9. Bad Character Heuristic 3. If the rightmost occurrence of the bad character is to the right of the current bad character position, then this heuristic makes no proposal. Text: There are no babies here. Pattern: drab Text: There are no babies here. Shift: drab The shift proposed would be negative, so it is ignored.

  10. Good Suffix Heuristic The good-suffix heuristic proposes to move the pattern to the right by the least amount so that a group of characters in the pattern will match with the good suffix found in the text. Text: ...I wish I had_an apple instead of... Pattern: banana Text: …..I wish I had an apple instead of... Shift: banana Shift two so that the second occurrence of ‘an’ in ‘banana’ matches the characters ‘an’ in the string.

  11. Text: Pattern: im a grad. dad is glad grad EXAMPLE Im_a_grad._dad_is_glad grad grad grad grad grad grad grad Bad-character Good-Suffix Match 1 2 3 7 4 11 12 comparisons out of 22 characters. 5 8 12 6 9 10

  12. EXAMPLE Text: Where are you moving? What are you doing? Pattern: grad Bad-character Good-Suffix Match Where_are_you_moving?_What_are_you_doing? grad grad grad grad grad grad grad grad grad grad grad 10 comparisons out of 41 characters. Last ‘grad’ is longer than the remaining string, so it is discarded before it is counted.

  13. Applets • http://www.accessone.com/~lorre/pages/bmi.html • http://www.i.kyushu-u.ac.jp/~takeda/PM_DEMO/e.html

  14. The Algorithm: Sigma = alphabet in use; T = Search string (text); P = Pattern; N = length[T]; M = length[P]; L =Compute_Last_Occurrence_Function(P, M, Sigma); (for bad-character heuristic) Y =Compute_Good_Suffix_Function(P, M); (for good-suffix heuristic) s = 0; while (s <= n-m) { (j = m); while (j > 0 AND P[j] = T[s+j]) { j--; if (j=0) { print(“Pattern FOUND!!! Location” s); s = s + Y[0]; else s = s+ max(Y[j], j-L[T[s+j]]);

  15. Sigma = alphabet in use; T = Search string (text); P = Pattern; N = length[T]; M = length[P]; a b c d e f g h i j k 0 0 0 0 2 4 0 0 0 0 Compute_Last_Occurrence_Function Compute_Last_Occurance_Function(P, M, Sigma) { /* Contained in the array L, there is a field for every letter in the alphabet. When this function is finished computing, the number in L[a] will represent the number of characters from the beginning of the pattern that the rightmost ‘a’ lies; L[b] will contain the distance from the beginning of the pattern for the right most occurrence of ‘b’, and so on. EXAMPLE: pattern: jeff L-> */ for (each character a in sigma) // Initialize all fields to 0 L[a] = 0; for (j = 0; j < m; j++) // For every letter in the pattern, L[P[j]] = j; // record its distance from the start return L; // of the pattern } 1 /* COMPLEXITY: O(Sigma + M) */

  16. Sigma = alphabet in use; T = Search string (text); P = Pattern; N = length[T]; M = length[P]; Compute_Good_Suffix_Function Compute_Good_Suffix_Function(P, M) { /* First get the prefix. The fields of Y represent the distance of the suffix from the start of the pattern, using the rightmost character as a reference. Then it searches the pattern to find the next rightmost occurrence of the suffix, and recommends that shift. If there is no other occurrence, it recommends a shift of the length of the pattern */ Pi = Compute_Prefix_Function(P) P’ = Reverse(P) Pi’ = Compute_Prefix_Function(P’) for (i = 0; i < M; i++) Y[i] = M - Pi[M]; for (j = 0; j < M; j++) i = M - Pi’[j]; if (Y[I] > j - Pi’[j] Y[I] = j - Pi’[l] return Y } /* COMPLEXITY: O(M) */

  17. Sigma = alphabet in use; T = Search string (text); P = Pattern; N = length[T]; M = length[P]; The Main Loop while (s <= n-m) { // for every shift (j = m); // while (j > 0 AND P[j] = T[s+j]) { // for the length of the pattern j--; // if (j=0) { // if you reach the beginning of the // pattern, print(“Pattern FOUND!!! Location” s); // You found the pattern! s = s + Y[0]; // Tell someone and shift else // the length of the pattern s = s+ max(Y[j], j-L[T[s+j]]); // else, choose the greater of the // two heuristic results

  18. Complexity O((n+m+1)m+|Sigma|) • Compute_Last_Occurrence: O(|Sigma| + m) • Compute_Good_Suffix: O(m) • Number of shifts: O(n-m+1) • Time to check the new shift: O(m) • Total: (|Sigma|+m) + m + m(n-m+1) • = O(NM) Worst Case

  19. HOWEVER...

  20. IN PRACTICE...

  21. the algorithm takes sub-linear time

  22. Specifically, in the best case, the algorithm’s running time is O(N/M) (length of text over length of pattern)

  23. The complexity is best when the letters in the pattern don’t match the letters in the text very often. Since this is generally the case, the average running time ends up being approximately equivalent to the best case. O(N/M) (length of text over length of pattern)

  24. Conclusion: The Boyer-Moore algorithm is a very good algorithm. Its worst case running time is linear; its best case running time is sub-linear. Most of the time it tends toward the best case rather than the worst case. I recommend the boyer-moore algorithm for searching a string. Shana Negin 252a-as December 14, 2000 Algorithms csc252

  25. Acknowledgements Corman: Chapter 34.5 Cole, Richard: “Tight Bounds on the complexity of the Boyer-Moore string-matching algorithm.” New York University http://www.accessone.com/~lorre/pages/bmi.html http://www.i.kyushu-u.ac.jp/~takeda/PM_DEMO/e.html

More Related