1 / 27

Improved Two-Way Bit-parallel Search

Improved Two-Way Bit-parallel Search. Branislav Durian, Tamanna Chhabra Sukhpal Singh Ghuman , Tommi Hirvola Hannu Peltola, Jorma Tarhio. String matching. String matching can be classified into: Exact string matching Approximate string matching K mismatches K errors.

scheidt
Download Presentation

Improved Two-Way Bit-parallel Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved Two-Way Bit-parallel Search Branislav Durian, Tamanna Chhabra Sukhpal Singh Ghuman, Tommi Hirvola Hannu Peltola, Jorma Tarhio

  2. String matching • String matching can be classified into: • Exact string matching • Approximate string matching • K mismatches • K errors

  3. ti-j ti ti+j m m m Our Approach • New sublinear variations of Shift-Or,Shift-And, and Shift-Add algorithms which apply bit-parallelism. • Key idea: two-way loop of j where text characters ti−j and ti+j are handled together. • Next alignment starts at ti+m.

  4. Bit-parallelism • Takes the advantage of internal parallelism of bit operations inside a computer word. • Many values in a single word are updated with a single operation. • Many operations of an algorithm can be performed faster.

  5. Previous algorithms: BNDM and its variants • BNDM(Backward Nondeterministic DAWG Matching) is the bit-parallel simulation of an earlier algorithm called BDM (Backward DAWG Matching). • BDM scans the alignment window from right to left and skips characters using a suffix automaton.

  6. Previous algorithms: Shift-Or and its variants • The Shift-Or algorithm was the first string matching algorithm applying bit- parallelism. • Operands in the algorithm are bit-vectors and the essential bit-vector containing the state of the automaton is called the state vector. • The state vector is updated with the bit-shift and OR operations.

  7. Previous algorithms: For the k-mismatches problem • Shift-Add is a bit-parallel algorithm for the k-mismatches problem. • A state vector D of m states is used to represent the state of the search.

  8. Our Algorithms • Exact string matching • TSO (Two-way Shift-Or) • TSA (Two-way Shift-And) • Approximate string matching with k mismatches • TSAdd (Two-way Shift-And) • Tuned Shift-Add

  9. TSO • TSO (Two-way Shift-Or) uses the same occurrence vectors B for characters as the original Shift-Or. • The outer loop traverses the text with a fixed step of m characters. At each step i, an alignment window ti-m+1,…, t i+m-1 is inspected.

  10. Example of working in the inner loop of TSO.

  11. Example (Cont…) T= …x a b c a b c a b x… a D= 1 0 1 1 0 j=1 c 1 1 0 1 1 b 0 1 1 0 1 D= 1 0 1 1 0 j=2 b 0 1 1 0 1 c 1 1 0 1 1 D= 1 0 1 1 0

  12. Example (Cont…) T= …x a b c a b c a b x… D= 1 0 1 1 0 j=3 a 1 0 1 1 0 a 1 0 1 1 0 D= 1 0 1 1 0 j=4 x 1 1 1 1 1 b 0 1 1 1 1 D= 1 0 1 1 0 E= 0 1 0 0 1

  13. TSA •  Shift-And is a dual method of Shift-Or. • TSA applies Shift-And and is a dual method of TSO.

  14. TSAdd for k mismatches • Two-way approach in exact matching is successful due to simple analogy to the one-way algorithm (Shift-Or, Shift-And). • key trick: To use the overflow bits in the state vector D. • Logical AND operation between the occurrence vector and the right shifted complemented state vector. • This idea is applied in the Two-way Shift-Add.

  15. Tuned Shift-Add • Tuned Shift-Add is a minimalist version of Shift-Add algorithm. • If bitvectors fit into computer register, the worst- and average-case complexity of the original Shift-Add algorithm O(n). • The original Shift-Add algorithm is using an overflow vector in addition to the state vector.

  16. Analysis - TSO • TSO is linear in the worst case and sub-linear in the average case. • The outer loop of TSO is executed n/m times. In each round, the inner loop is executed at most m − 1 times. • The most trivial implementation of popcount requires O(m) time. So the total time in the worst case is O(nm/m) = O(n). • The same analysis applies to TSA.

  17. Analysis - TSAdd • The outer loop of TSAddq is executed n/m times, and in each iteration O(m) text characters are read and O(m) occurrences are reported. • Thus, the total time complexity is O(n/m)· O(m + m) = O(n) for the worst case. • On the average case TSAdd is sub-linear. It can been seen from the test results where the search time decreases when m gets larger.

  18. Analysis - Tuned Shift-Add • The worst- and average-case complexity of the original Shift-Add algorithm O(n). • Tuned Shift-Add is linear.

  19. Experimental Results • In the test runs we used binary, DNA, and English texts. • The best execution times have been put in boxes in the tables represented following slides. • It is clearly evident from the tables that our algorithms run faster that the previous algorithms, especially for larger larger pattern length.

  20. Search time (ms) for Binary dataPattern Length

  21. Search time (ms) for DNA dataPattern Length

  22. Search time (ms) for English data Pattern length

  23. Algorithms for k mismatchesSearch times (ms) k = 1

  24. Search times (ms) for k=2

  25. Search times (ms) for k=3

  26. Conclusion • The new algorithms and their tuned versions are efficient both in theory and practice. • They run in linear time in the worst case and in sublinear time in the average case.

  27. THANK YOU

More Related