1 / 12

Boyer-Moore String Searching Algorithm

Boyer-Moore String Searching Algorithm. By: Matthew Brown. String-Searching Algorithms. The goal of any string-searching algorithm is to determine whether or not a match of a particular string exists within another (typically much longer) string.

denzel
Download Presentation

Boyer-Moore String Searching Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Boyer-Moore String Searching Algorithm By: Matthew Brown

  2. String-Searching Algorithms • The goal of any string-searching algorithm is to determine whether or not a match of a particular string exists within another (typically much longer) string. • Many such algorithms exist, with varying efficiencies. • String-searching algorithms are important to a number of fields, including computational biology, computer science, and mathematics.

  3. The Boyer-Moore String Search Algorithm • Developed in 1977, the B-M string search algorithm is a particularly efficient algorithm, and has served as a standard benchmark for string search algorithm ever since. • This algorithm’s execution time can be sub-linear, as not every character of the string to be searched needs to be checked. • Generally speaking, the algorithm gets faster as the target string becomes larger.

  4. How does it work? • The B-M algorithm takes a ‘backward’ approach: the target string is aligned with the start of the check string, and the last character of the target string is checked against the corresponding character in the check string. • In the case of a match, then the second-to-last character of the target string is compared to the corresponding check string character. (No gain in efficiency over brute-force method) • In the case of a mismatch, the algorithm computes a new alignment for the target string based on the mismatch. This is where the algorithm gains considerable efficiency.

  5. An example • Target string: rockstar Check string: -------x----- • Aligning the start of each string pairs ‘r’ with ‘x’. • Since ‘x’ is not a character in ‘rockstar’, it makes no sense to check alignments beginning with any character in the check string which comes before ‘x’, and the B-M algorithm skips all such alignments. • This eliminates several (7, in this case) alignments to be checked by the algorithm, and we needed to compare only two characters.

  6. Efficiency of the B-M Algorithm • The average-case performance of the B-M algorithm, for a target string of length M and check string of length N, is N/M. • In the best case, only one in M characters needs to be checked. • In the worst case, 3N comparisons need to be made, leading to a complexity of O(n), regardless of whether or not a match exists.

  7. Pre-processing Tables • The B-M algorithm computes 2 preprocessing tables to determine the next suitable alignment after each failed verification. • The first table calculates how many positions ahead of the current position to start the next search (based on character which caused failed verification). • The second table makes a similar calculation based on how many characters were matched successfully before a failed verification • These tables are often referred to as ‘jump tables’, though this leads to some ambiguity with the more common meaning of the term in computer science, which refers to an efficient way of transferring control from one part of a program to another.

  8. Calculation of Preprocessing Tables • Table 1 • Starting at the last character of the target string, move left toward the first character. At each character, if the character is not already in the table, add it to the table. • This character’s shift value is equal to it’s distance from the right-most character in the string. • All other characters receive a shift value equal to the total length of the string. • Example: ‘peterpan’ would produce the following table: (character, shift) = (A, 1), (P, 2), (R, 3), (E, 4), (T, 5), (all other characters, 8)

  9. Calculation of Preprocessing Tables • Table 2 • First, for each value of iless than the length of the target string, calculate the pattern of the last icharacters of the target string preceded by a mis-match for the character before it. • Then, determine the least number of characters of the partial pattern that must be shifted left before two patterns match. • Example: for ‘ANPANMAN’, the table would be (I, pattern, shift) = (0, -N, 1), (1, (-A)N, 8), (2, (-M)AN, 3), (3, (-N)MAN, 6), (4, (-A)NMAN, 6), (5, (-P)ANMAN, 6), (6, (-N)PANMAN, 6), (7, (-A)NPANMAN, 6). (here, -X means ‘not X’)

  10. Comparison of String Searching Algorithm Complexities • Boyer-Moore: O(n) • Naïve string search algorithm: O((n-m+1)m) • Bitap Algorithm: O(mn) • Rabin-Karp string search algorithm: [average O(n+m)] (n = length of search string, m = length of target string)

  11. About the Creators • Robert Boyer is a retired Professor Emeritus of the University of Texas at Austin Computer Science Department. He received his BA and PhD in mathematics at UT Austin, and has authored and co-authored several books concerning automatic theorem-proving. J. Strother Moore is Admiral B.R. Inman Centennial Chair in Computer Theory of the Department of Computer Sciences at UT Austin. He received his BS in mathematics from MIT in 1970, and his PhD in computational logic from the University of Edinburgh in 1973. He has authored and co-authored several books concerning automatic theorem-proving, some of them in cooperation with Robert Boyer.

  12. References • Wikipedia.org • http://www-igm.univ-mlv.fr/~lecroq/string/ • Epp, Susanna S. Discrete Mathematics with Applications. 3rd Ed., Brooks/Cole 2004.

More Related