1 / 18

Search Algorithms Winter Semester 2004/2005 11 Oct 2004 1st Lecture

Search Algorithms Winter Semester 2004/2005 11 Oct 2004 1st Lecture. Christian Schindelhauer schindel@upb.de. Contents. The very various aspects of search in computer science Likewise Searching text Searching the Web Searching the DNS Searching the exit of a maze (labyrinth)

dionne
Download Presentation

Search Algorithms Winter Semester 2004/2005 11 Oct 2004 1st Lecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search AlgorithmsWinter Semester 2004/200511 Oct 20041st Lecture Christian Schindelhauer schindel@upb.de

  2. Contents • The very various aspects of search in computer science • Likewise • Searching text • Searching the Web • Searching the DNS • Searching the exit of a maze (labyrinth) • Searching a man over board • Trade-offs between time and space in search • Searching or Deciding? Which one is harder? • Language: English • Examinations can be also made in German (if wanted)

  3. Organisation (I) • Lecture: • Monday, 11 am - 1pm, FU 116 (Beethoven) • Exercise Classes (Übungen) • Start next week • Participation in the exercises classes is mandatory • Monday, 1pm - 2pm, Stefan Rührup • Wednesday, 1pm - 2pm, Christian Schindelhauer • Registration for Exercise Classes • By StudInfo-System • See web page: • http://wwwcs.upb.de/cs/ag-madh/WWW/Teaching/2004WS/SearchAlg/ • Find web page from my home page: • http://www.upb.de/cs/schindel.html • Register for the Exercise Classes as soon as possible!

  4. Organisation (II) • Material available at the web-site • Slides of the lectures in MS PowerPoint format and PDF • Lecture notes (with possible exam questions) • Exercises • Schedule of the lecture (with upcoming topics and examination dates) • Literature links • Material not available at the web-site • Solutions for the exercises • Solutions for the exam questions • Names of students registered for exercise classes or examinations

  5. Examinations • Two exams: • 1st written exam (45 minutes) • Wednesday, 8 Dec. 2004, 12 am, in F0.530 • Contents: Lectures and Exercises in October and November 2004 • 2nd oral exam (25 minutes) • In the week from 7 Feb to 11 Feb 2005 in F2.315 • Each exam covers one half of the lecture • The over-all grade is the mean of both examination grades • Exercise rebate • If a student does not participate within the exercise class: • 1 extra examination question in the first test • 1 extra hour for solving an exercise prior to the 2nd oral exam

  6. Exercises • Successful participation includes: • Registration to one of the exercise classes • Regularly appearing in the exercise classes • Solving at least two exercises (one in the first half and one in the second half) • Presenting these solutions within the exercise class • Written workouts of these solutions (submitted before the exams) • Reservations for exercises for presentation • Can be made by the StudInfo-System

  7. Chapter I Chapter I Searching Text 10 Oct 2004

  8. Search Text (Overview) • The task of string matching • Easy as a pie • The naive algorithm • How would you do it? • The Rabin-Karp algorithm • Ingenious use of primes and number theory • The Knuth-Morris-Pratt algorithm • Let a (finite) automaton do the job • This is optimal • The Boyer-Moore algorithm • Bad letters allow us to jump through the text • This is even better than optimal (in practice) • Literature • Cormen, Leiserson, Rivest, “Introduction to Algorithms”, chapter 36, string matching, The MIT Press, 1989, 853-885.

  9. The task of string matching • Given • A text T of length n over finite alphabet : • A pattern P of length m over finite alphabet : • Output • All occurrences of P in T T[1] T[n] m a n a m a n a p a t i p i t i p i P[1] P[m] p a t i T[s+1..s+m] = P[1..m] m a n a m a n a p a t i p i t i p i Shift s p a t i

  10. The Naive Algorithm Naive-String-Matcher(T,P) • n  length(T) • m length(P) • for s  0 to n-m do • if P[1..m] = T[s+1 .. s+m] then • return “Pattern occurs with shift s” • fi • od Fact: • The naive string matcher needs worst case running time O((n-m+1) m) • For n = 2m this is O(n2) • The naive string matcher is not optimal, since string matching can be done in time O(m + n)

  11. The Rabin-Karp-Algorithm • Idea: Compute • checksum for pattern P and • checksum for each sub-string of T of length m m a n a m a n a p a t i p i t i p i checksums 4 2 3 1 4 2 3 1 3 1 2 3 1 0 1 checksum 3 spurious hit valid hit p a t i

  12. The Rabin-Karp Algorithm • Computing the checksum: • Choose prime number q • Let d = || • Example: •  • Then d = 10, q = 13 • Let P = 0815 S4(0815) = (0 1000 + 8  100 + 1  10 + 5  1) mod 13 = 815 mod 13 = 9

  13. How to Compute the Checksum: Horner’s rule • Compute • by using • Example: •  • Then d = 10, q = 13 • Let P = 0815 S4(0815) = ((((010+8)10)+1)10)+5 mod 13 = ((((810)+1)10)+5 mod 13 = (3 10)+5 mod 13 = 9

  14. How to Compute the Checksums of the Text • Start with Sm(T[1..m]) m a n a m a n a p a t i p i t i p i checksums Sm(T[1..m]) Sm(T[2..m+1])

  15. The Rabin-Karp Algorithm Rabin-Karp-Matcher(T,P,d,q) • n  length(T) • m  length(P) • h  dm-1 mod q • p  0 • t0 0 • for i  1 to m do • p  (d p + P[i]) mod q • t0 (d t0 + T[i]) mod qod • for s  0 to n-m do • if p = ts then • if P[1..m] = T[s+1..s+m] then return “Pattern occurs with shift” s fi • if s < n-m then • ts+1 (d(ts-T[s+1]h) + T[s+m+1]) mod q fiod Checksum of the pattern P Checksum of T[1..m] Checksums match Now test for false positive Update checksum forT[s+1..s+m] usingchecksum T[s..s+m-1]

  16. Performance of Rabin-Karp • The worst-case running time of the Rabin-Karp algorithm is O(m (n-m+1)) • Probabilistic analysis • The probability of a false positive hit for a random input is 1/q • The expected number of false positive hits is O(n/q) • The expected run time of Rabin-Karp is O(n + m (v+n/q))if v is the number of valid shifts (hits) • If we choose q ≥ m and have only a constant number of hits, then the expected run time of Rabin-Karp is O(n +m).

  17. Knuth-Morris-Pratt: The Principle m a n a m a m a p a t i p i t m a m a m a m a m a m a m a m a m a m a m a m a m a m a

  18. Thanks for your attentionEnd of 1st lectureNext lecture: Mo 18 Oct 2004, 11 am, FU 116Next exercise class: Mo 18 Oct 2004, 1 pm, F0.530 or We 20 Oct 2004, 1 pm, F1.110

More Related