1 / 32

Knuth-Morris-Pratt

Knuth-Morris-Pratt. String matching algorithm. Ivaylo Kenov. Telerik Corporation. http:/telerikacademy.com. Telerik Academy Student. Table of Contents. Background and idea The “naive” approach Basic definitions Preprocessing Search algorithm Complexity Additional information.

kylynn-wall
Download Presentation

Knuth-Morris-Pratt

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knuth-Morris-Pratt String matching algorithm IvayloKenov Telerik Corporation http:/telerikacademy.com Telerik Academy Student

  2. Table of Contents • Background and idea • The “naive” approach • Basic definitions • Preprocessing • Search algorithm • Complexity • Additional information

  3. Background and idea What is the problem?

  4. Background and idea • The problem of string matching. • We have string text and pattern word. • Check if word occurs in text. • If so, return the position where pattern occurs. • If not, return -1.

  5. The “naive” approach New to string searching

  6. The naive approach (1) • Very obvious solution – compare element by element. • O(m*n) complexity – not good! • Example: String Text Pattern Word

  7. The naive approach (2) • Step 1: compare word[0] with text[0] • Step 2: compare word[1] with text[1] Text Word Text Word

  8. The naive approach (3) • Step 1: compare word[2] with text[2] • Mismatch found – shift word one index to the right and repeat! Text Word Text Word

  9. The naive approach (4) • A match will be found after three shifts to the right of the word! • Problem with the “naive” approach – two much comparisons over the same character! Text Word

  10. The “naive” approach Live demo

  11. Knuth-Morris-Pratt Without repeating!

  12. Knuth-Morris-Pratt • Linear time algorithm for string matching. • O(n) complexity. • Backtracking never occurs. • Already visited characters are not repeated! • Useful with binary data and small-alphabet strings.

  13. Basic definitions Easy theory!

  14. Basic definitions (1) • Prefix – a substring with which our string starts. • Example: “abcdef” starts with “abc”. • Suffix – a substring with which our string ends. • Example: “abcdef” ends with “def”. • Proper prefix and proper suffix – if the length of the substring is less than the length of the string.

  15. Basic definitions (2) • Border - if a substring is proper prefix and proper suffix at the same time. • Example: “ab” is border of “abcab”. • Width of border – length of the border. • The empty string “” is proper prefix, proper suffix and border at the same time of any string!

  16. Basic definitions (3) • How much the algorithm shifts the pattern? • The shift distance is determined by the widest border of the matching prefix of word. • Distance = length of the matching prefix – length of the widest border.

  17. Preprocessing Building every border!

  18. Preprocessing (1) • If a, b are borders of text and length of a < length of b, then a is border of b. • A border r of x can be extended by a, if ra is border of xa.

  19. Preprocessing (2) • We build an array table, which contains information about border widths. • When preprocessing a value, we already know the previous ones and use the extending of the borders for checking. • Border can be extended if tableb[i] = tablei. • If not next border to check is table[table[i]].

  20. Preprocessing (3) • Algorithm for building the table: void FailFunction(string word) { int index = 0; intborderWidth = -1; failureTable[index] = borderWidth; while (index < word.Length) { while (borderWidth >= 0 && word[index] != word[borderWidth]) { borderWidth = failureTable[borderWidth]; } index++; borderWidth++; failureTable[index] = borderWidth; } }

  21. Preprocessing (4) • Example for table: • For pattern ”ababaa” the widths of the borders in array b have the following values. For instance we have table[5] = 3, since the prefix “ababa” of length 5 has a border of width 3. • Note: zero element is always -1.

  22. Preprocessing Live demo

  23. Search algorithm Finding the word!

  24. Search algorithm (1) • The search algorithm is similar: static intKMPSearch(string text, string word, int position) { int index = 0; intborderWidth = 0; intcurrentPosition = 1; while (index < text.Length) { while (borderWidth >= 0 && text[index] != word[borderWidth]) { borderWidth = failureTable[borderWidth]; } index++; borderWidth++; Continues…

  25. Search algorithm (2) • Algorithm continues: Continues… if (borderWidth == word.Length) { if (position == currentPosition) { return (index - borderWidth); } else { currentPosition++; } borderWidth = failureTable[borderWidth]; } } return -1; }

  26. Search algorithm (3) • How it works: • Example:

  27. Search algorithm Live demo

  28. Complexity Linear time algorithm!

  29. Complexity • The table building algorithm is O(m) where m is the length of the pattern. • The search algorithm is O(n) where n is the length of the text. • Overall complexity therefore is O(n).

  30. Additional information • Wikipedia: http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#Worked_example_of_the_table-building_algorithm • Knuth-Morris-Pratt explained: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm • Examples and concept: http://wcipeg.com/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

  31. http://algoacademy.telerik.com

  32. Free Trainings @ Telerik Academy • “C# Programming @ Telerik Academy • csharpfundamentals.telerik.com • Telerik Software Academy • academy.telerik.com • Telerik Academy @ Facebook • facebook.com/TelerikAcademy • Telerik Software Academy Forums • forums.academy.telerik.com

More Related