## Boyer Moore Searches on Binary Texts

**Accelerating**Boyer Moore Searches on Binary Texts Shmuel Tomi Klein Miri Kopel Ben-Nissan Bar Ilan University, ISRAEL**Background and motivation**Boyer Moore algorithm New binary variant Analysis Experiments Summary Outline Background and motivation Boyer Moore algorithm New binary variant Analysis Experiments Summary**Important application of Automata:**KMP BDM BM PATTERN MATCHING Boyer & Moore Match Backwards ! ! this-is-a-sample-text--- pattern**shift**x contains no b Boyer – Moore Algorithm Mismatch – case 1: delta1 b does not occur inx y b u x a u**shift**x b contains no b Boyer – Moore Algorithm Mismatch – case 2: delta1 b occurs inx y b u x a u**shift**x c u Boyer – Moore Algorithm Mismatch – case 3: delta2 u reoccurs inx preceded by c≠a y b u x a u**shift**x v Boyer – Moore Algorithm Mismatch – case 4: delta2 Only a suffixvofu reoccurs inx y b u x a u v**here is a simple example**example here is a simple example example here is a simple example example delta1 example delta2 here is a simple example example here is a simple example example Boyer – Moore Example**this-is-a-sample-text---**pattern 0100101101011101000100110101001 1101100 Bit-level processing Problems of Binary Boyer & Moore most work by delta1 delta1 useless**Need for Binary Boyer & Moore**Compressed Matching Given E(T) and P look for E(P) in E(T) rather than P in D(E(T)) Suggested Solution: BBBMM BlockedBinaryBoyerMooreMatching**k**Text [ i ] Pat [ sh , j ] sh sl BBBMM**BBBMM**More information in binary case ffghabdgttiocb sbgghj ASCII 01100010 01101010 BINARY**i – 1**i i + 1 T 101 P 101 100 101 01 BBBMM extended delta1**K**T P sl k BBBMM Total size of delta1 tables: If too large, use limit value Size of delta1 tables reduced to**T**P BBBMM Original delta1 : increase of text pointer BBBMM delta1 : shift size Mismatch not in last block Correct[sh,j]**T**P BBBMM delta2**Analysis**Assumption: random input Reasonable for compressed text Expected # comparisons till mismatch: Bit-wise: Blocked:**Analysis**Expected # bits shifted after mismatch: Bit-wise: M Blocked: M’**Experiments**English Bible (2.5MB) World Factbook (1.5MB) Text: Huffman encoded k = 8 Patterns: Random substrings of lengths 10 to 500**Bit-wise**1.5 Blocked 1.4 1.3 1.2 1.1 100 200 300 400 500 length of pattern Experiments: Average # comparisons between shifts**100**Blocked 80 60 40 20 100 200 300 400 500 length of pattern Experiments: Average size of shifts Bit-wise**Bit-wise**500 BDM 400 Blocked 300 200 100 100 200 300 400 500 length of pattern Experiments: Average # comparisons for 1000 bits**Bit-wise**BDM 300 Turbo-BDM 250 Blocked 200 150 100 50 100 200 300 400 500 length of pattern Experiments: Time to locate first occurrence (ms)**Summary**Blocked variant of BM Faster than alternatives, Overhead 1-10 K Extensions: ASCII, words instead of characters