00:00

Regular Expression Matching: Review and Conversion Algorithms

Regular expressions (RE) provide an algebraic approach to defining languages. They are built using operators like concatenation, union, and closure. The process of converting an RE to an ε-NFA involves a special form automaton with specific states. Ken Thompson's pioneering work in this area, which won him the Turing Award, laid the foundation for efficient matching algorithms. Converting an RE to an ε-NFA allows for evaluating matches in linear time complexity. The construction steps involve adding a constant number of nodes and edges, resulting in an overall size of O(m). This approach enables efficient string matching using DFAs or ε-NFAs.

reiz
Download Presentation

Regular Expression Matching: Review and Conversion Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expression Matching Algorithms Michael T. Goodrich University of California, Irvine

  2. RE’s: Review • Regular expressions are an algebraic way to describe languages. • They describe exactly the regular languages. • If E is a regular expression, then L(E) is the language it defines. • An individual character is an RE, and we define RE’s inductively using concatenation, or (union), and * closure. • Parentheses may be used wherever needed to influence the grouping of operators. • Order of precedence is * (highest), then concatenation, then | (lowest). 2

  3. Examples: RE’s • L((0|1)*101(0|1)*) = all strings of 0’s and 1’s having 101 as a substring. • L((0|1)*1(0|1)*0(0|1)*1(0|1)*) = all strings of 0’s and 1’s having 101 as a subsequence. • L(1*(1*01*01*01*)*1*) =all strings of 0’s and 1’s having a number of 0’s that is a multiple of 3. 3

  4. Converting a RE to an ε-NFA  Proof was published in 1968 by Ken Thompson.  …*  1983: Ken Thompson received the Turing Award (along with Dennis Ritchie). *He also invented Unix (w/ Dennis Ritchie) and the B language (which was a precursor to C, which was invented by Ritchie) Image from https://en.wikipedia.org/wiki/Ken_Thompson

  5. Converting a RE to an ε-NFA • Proof is an induction on the number of operators (or, concatenation, * closure) in the RE. • We always construct an automaton of a special form: No arcs from outside, no arcs leaving Start state: Only state with external predecessors “Final” state: Only state with external successors 5

  6. RE to ε-NFA: Basis • Symbol a: a • ε: ε • ∅: 6

  7. RE to ε-NFA: Induction 1 – Or For E1 ε ε ε ε For E2 For E1| E2 7

  8. RE to ε-NFA: Induction 2 – Concatenation ε For E1 For E2 For E1E2 8

  9. RE to ε-NFA: Induction 3 – Closure ε ε ε For E ε For E* 9

  10. Example  For the RE (ε|(a*b)): Image from https://en.wikipedia.org/wiki/Thompson%27s_construction

  11. Analysis  Note that each combination in the above construction adds only a constant number of nodes and edges in each step.  So, if we start with an RE with m symbols, we will construct an equivalent ε-NFA of size O(m).  Thus, we can determine whether a regular expression of size m matches a string of size n by converting the RE to an equivalent ε-NFA, converting that to an equivalent DFA, and using DFA simulation to test for a match in O(2m+ n) time.  Alternatively, we can use ε-NFA simulation to test for a regular-expression match in O(mn) time.

More Related