1 / 74

Chapter 3 Regular Expressions and Languages

Chapter 3 Regular Expressions and Languages. Giza Pyramids, Egypt. Outline. 3.1 Regular Expressions 3.2 Finite Automata & Regular Expressions 3.3 Applications of RE’s 3.4 Algebraic Laws for RE’s. 3.1 Regular Expressions. Use of Regular expressions

drea
Download Presentation

Chapter 3 Regular Expressions and Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Regular Expressions and Languages Giza Pyramids, Egypt

  2. Outline • 3.1 Regular Expressions • 3.2 Finite Automata & Regular Expressions • 3.3 Applications of RE’s • 3.4 Algebraic Laws for RE’s

  3. 3.1 Regular Expressions • Use of Regular expressions • The regular expression is a kind of generator for languages. • It offers a “declarative” way of expressing strings of symbols. • It defines all and only regular languages (a theorem).

  4. 3.1 Regular Expressions • Applications of Regular expressions • Used as commands for finding strings in Web browsers or text-formatting systems (such as UNIX grep commands)* • Used as lexical analyzer generator (such Lex or Flex) • A lexical analyzer breaks source programs into “tokens” (keywords, identifiers, signs, …) • The grep command searches files or standard input globally for lines matching a given regular expression, and prints them to the program's standard output.

  5. 3.1 Regular Expressions • Operators of Regular Expressions • Review of three operations on languages L and M: • Union --- L∪M = {x | xL or xM} • Concatenation --- LM = {xy | xL, yM} • Example --- L0 = {e}, L1 = L, L2 = LL, … • Closure (or star, or Kleene closure) --- L* = L0∪L1∪L2∪...

  6. 3.1 Regular Expressions • Example 3.1 --- (language) • f* = {e} because f0 = {e}. • If L = {0, 1}, then L0 = {e}, L1 = L, L2 = {00. 01, 10, 11}, … • If L is the set of all strings of 0’s, then it can be proved that L* is L itself (see the textbook for the proof).

  7. 3.1 Regular Expressions • 3.1.2 Building Regular Expressions • Recursive definition of a regular expression (RE)E and the language which it defines, L(E): • Basis: • Constants e and f are RE’s, defining languages {e} and f, respectively L(e) = {e}, L(f) = f. • If a is a symbol, then a is an RE, defining the language {a} L(a) = {a}. (note: ais of bold face) • A variable like L (capitalized and italic) represents any language.

  8. 3.1 Regular Expressions • 3.1.2 Building Regular Expressions • Recursive definition of an RE (cont’d): • Induction: given two RE’s E and F, then • E + F is an RE such that L(E + F) = L(E)∪L(F) (union) • EF is an RE such that L(EF) = L(E)L(F) (concatenation) • E* is an RE such that L(E*) = (L(E))*(closure) • (E) is an RE such that L((E)) = L(E) (parenthesization).

  9. 3.1 Regular Expressions • Examples (supplemental)(1/4) • RE F = 1 “expresses” the language L(1) = {1}. • RE E = 1* • Language expressed by E --- L = L(E) = L(1*) = (L(1))* = ({1})* (closure of language) = {e, 1, 11, 111, 1111, …} = {1n | n 0}

  10. 3.1 Regular Expressions • Examples (supplemental)(2/4) • RE G = 01* • Language expressed by G --- L = L(G) = L(01*) = L(0)L(1*) (concatenation) = {0}{e, 1, 11, 111, 1111, …} = {0, 01, 011, 0111, …} = {01n | n 0}

  11. 3.1 Regular Expressions • Examples (supplemental)(3/4) • RE H = 1 + 01* • Language expressed by H --- L = L(H) = L(1 + 01*) = L(1) U L(01*) = {1} U {0, 01, 011, 0111, …} = {1, 0, 01, 011, 0111, …} = {1}U{01n | n 0}

  12. 3.1 Regular Expressions • Examples (supplemental)(4/4) • RE K = e + a* • Language expressed by K --- L = L(K) = L(e + a*) = L(e) UL(a*) = {e} U {e, a, aa, aaa, …} = {e, a, aa, aaa, …} = L(a*) That is, we have the following RE equalities: e + a* = a* = a* +e

  13. 3.1 Regular Expressions • Example 3.2 • An RE defining a language of strings of alternating 0’s and 1’s (including none) is one of the two below: • (01)* + (10)* + 0(10)* + 1(01)* (0…1 1…0 0…0 1…1) • (e + 1)(01)*(e + 0) (Why? See the textbook.)

  14. 3.1 Regular Expressions • 3.1.3 Precedence of RE operators • Precedence • Highest --- * (closure) • Next--- . (concatenation) (left to right) • Last--- + (union) (left to right) • Use parentheses anywhere to resolve ambiguity

  15. 3.1 Regular Expressions • 3.1.3 Precedence of RE operators • Example 3.3: Three ways to interpret 01* + 1: • (0(1*)) + 1 by precedence above (= 01* + 1) • (01)* + 1 (another meaning) • 0(1* + 1) (a third meaning)

  16. 3.2 FA’s & RE’s • Theorems to be proved: • Every language defined by a DFA is also defined by an RE. • Every language defined by an RE is also defined by an e-NFA.

  17. 3.2 FA’s & RE’s • Relations of theorems (yellow lines are to be proved): NFA e-NFA RE DFA

  18. 3.2 FA’s & RE’s • 3.2.1 From DFA’s to RE’s • Theorem 3.4: If L = L(A) for some DFA A, then there is an RE R such that L = L(R). Proof. • Prove by constructing progressively string sets defined by a certain RE formRij(k)until the entire set of acceptable strings (i.e., language L(A)) is obtained. • Assume the states are {1, 2, ..., n} (1 is the start state).

  19. 3.2 FA’s & RE’s • Meaning of Rij(k) --- • RE Rij(k) is used to denote the set of strings w such that • Each w is the label of a path from state i to state j in DFA A; • the path has nointermediate node whose number is larger than k.

  20. 3.2 FA’s & RE’s • Meaning of Rij(k) --- • T construct Rij(k), we use induction, starting at k = 0 and stop at k = n (the largest state number). • Then, when k = n, i =1, and j specifies an accepting state, then Rij(k) defines a set of strings accepted by DFA A, with each string forming a path starting from the start state to the accepting state.

  21. 3.2 FA’s & RE’s • Meaning of Rij(k) --- Basis: • when k = 0, all state numbers  1, and so there is no intermediate state in path i to j, leading to 2 cases: (1) an arc (a transition) from i to j; (2) a path from i to i itself.

  22. 3.2 FA’s & RE’s • Meaning of Rij(k) --- Basis (cont’d): • If ij, only (1) is possible, leading to 3 cases: • no symbol for such a transition Rij(0) = f • one symbol a for the transition Rij(0) = a • multiple symbls a1, a2, ..., am for the transition, Rij(0) = a1 + a2 + ... + am

  23. qi qj a qi qj a1+…+am qi qj 3.2 FA’s & RE’s • Meaning of Rij(k)(supplemental) --- Basis (cont’d) ij: • Rij(0) = f • Rij(0) = a • Rij(0) = a1 + a2 + ... + am

  24. 3.2 FA’s & RE’s • Meaning of Rij(k) --- Basis (cont’d): • If i=j, only (2) is possible, which means there exists at least a path e from i to i itself, in addition to the 3 cases: • no symbol for such a transition Rij(0)= e • one symbol a for the transition Rij(0)= e + a • multiple symbls a1, a2, ..., am for the transition, Rij(0) = e + a1 + a2 + ... + am

  25. qi qi e+a1+…+am 3.2 FA’s & RE’s • Meaning of Rij(k)(supplemental) --- Basis (cont’d) i=j: • Rij(0) = e • Rij(0) = e + a • Rij(0) = e + a1 + a2 + ... + am e e + a qi

  26. 3.2 FA’s & RE’s Induction (to compute Rij(k)): • Suppose there is a path from i to j that goes through no state numbered higher than k. Then, two cases should be considered: • (1) the path does not go through k  Rij(k-1) • (2) the path goes through k at least once, then the path may be broken into 3 pieces: • through i to k without passing k Rik(k-1) • from k to k itself  (Rkk(k-1))* (recusive); • from k to j without passing k  Rkj(k-1).

  27. (Rkk(k-1))* … circulating zero or more times … … i j k Rkj(k-1) Rik(k-1) 3.2 FA’s & RE’s • Illustration of paths represented by Rij(k):

  28. 3.2 FA’s & RE’s Induction (cont’d): • The three pieces are concatenated to be Rik(k-1)(Rkk(k-1))*Rkj(k-1). • Combining (1) & (2), we get the RE defining “all the paths from i to j that go through no state higher than k” as Rij(k) = Rij(k-1) + Rik(k-1)(Rkk(k-1))*Rkj(k-1).

  29. 3.2 FA’s & RE’s Induction (cont’d): • ConstructingRik(k)in the order of k until k = n for i = 1 • For each accepting state jk, we can get the union below as the result R1j1(n) +R1j2(n)+ ... +R1jm(n) where {j1, j2, ..., jm} are the set of final states, F. (End of proof of thereom)

  30. 0, 1 1 0 start 1 2 3.2 FA’s & RE’s • Example 3.5 • Convert the following DFA into an RE. • Rij(0) may be constructed to be (details in the next page):

  31. 0, 1 1 0 start 1 2 3.2 FA’s & RE’s • Example 3.5 (cont’d) • R11(0) = e + 1 because d(1, 1) = 1 & going back to itself • R12(0) = 0 because d(1, 0) = 2 (going out to state 2) • R21(0) = f because there is no path from state 2 to 1 • R22(0) = (e + 0 + 1) because d(2, 0) = 2 & d(2, 1) = 2 & going back to itself

  32. 0, 1 1 0 start 1 2 3.2 FA’s & RE’s • Example 3.5 (cont’d) • We can then compute all Rij(k) for k=1 & k=2. • However, we may alternatively compute onlynecessary terms of Rij(k) backward from the final states, to save time.

  33. 3.2 FA’s & RE’s • Example 3.5 (cont’d) • There is only one final state 2, so only have to compute R12(2) = R12(1) + R12(1)(R22(1))*R22(1). • Only have to compute R12(1) and R22(1), without computing R21(1) and R11(1). • To compute each of these terms, we need some RE equalities to simplify intermediate results.

  34. 3.2 FA’s & RE’s • Some equalities (R is an RE): • fR=Rf=f (f=annihilator for concatenation) • f + R = R + f = R (f=identity for union) • eR = Re = R (e = identity for concatenation) • (e + a)* = a* =(a + e)* • (e + a)a* = (ea* + aa*) = a* + a+ = a* a*(e + a) = (a*e + a*a) = a* + a+ = a* (all provable by easy deduction)

  35. 3.2 FA’s & RE’s • To compute R12(2) = R12(1) + R12(1)(R22(1))*R22(1) • R12(1) = R12(0) + R11(0)(R11(0))*R12(0) = 0 + (e + 1)(e + 1)*0 (by substitutions) = 0 + (e + 1)1*0 (by 4. in last page) = 0 + 1*0 (by 5.) = (e + 1*)0 (by distributive law) = 1*0 (by 4.)

  36. 3.2 FA’s & RE’s • To compute R12(2) = R12(1) + R12(1)(R22(1))*R22(1) • R22(1) = R22(0) + R21(0)(R11(0))*R12(0) = (e+ 0 + 1) + f(e + 1)*0 (by substitutions) = (e+ 0 + 1) + f (by 1.) = e+ 0 + 1 (by 2.)

  37. 3.2 FA’s & RE’s • To compute R12(2) = R12(1) + R12(1)(R22(1))*R22(1) • Finally, R12(2) = 1*0 +1*0(e + 0 + 1)*(e + 0 + 1) (by subst.) = 1*0 +1*0(0 + 1)*(e + 0 + 1) (by 4.) = 1*0 +1*0(0 + 1)* (by 6.) =1*0(e + (0 + 1)*) (by distributive law) = 1*0(0 + 1)*(by 4.)

  38. 0, 1 1 0 start 1 2 3.2 FA’s & RE’s • Check the correctness of the final result R12(2)= 1*0(0 + 1)* correct (by looking at the diagram directly)! • The above method also works for NFA ande-NFA.

  39. R11 R11+ Q1S*P1 q2 q1 q2 q1 S . . . Q1 P1 . . . . . . . . . s Fig. 3.8 (partial) Fig. 3.7 (partial) 3.2 FA’s & RE’s • 3.2.2 Converting DFA’s to RE’s by Eliminating States --- another way • Step 1 – regard symbols on arcs as RE’s • Step 2 – conduct the following conversion • Step 3 – collect RE’s for all the final states (for a complete diagram of this, see textbook)

  40. U R S Fig. 3.9 q q0 start T 3.2 FA’s & RE’s • Details of Step 3: (1) For each final state q, eliminate all states as above except the start state q0. (2) If qq0, then a 2-state automaton is left as follows: Corresponding RE is (R+SU*T)*SU* (provable by the first method)

  41. R q0 Fig. 3.10 start 3.2 FA’s & RE’s (3) If q = q0, then perform one more state elimination to eliminate q, leaving only the start state q0 as follows (see an example in the next page): The corresponding RE is R*. (4)Collect the result for each final state derived as above to get the final result.

  42. V X Y q q0 Z R11+ Q1S*P1 =X+YV*Z R11=X q0 q0 q0 q0 S= V . . . . . . . . . Q1=Y P1=Z . . . start s Fig. 3.7 (partial) Fig. 3.8 (partial) 3.2 FA’s & RE’s • An example of Case (3) in the last page (supplemental) • Regard q0 as two separate states, q as s, and apply Figs. 3.7 & 3.8 to eliminate q1 as follows:

  43. R=X + YV*Z start q0 3.2 FA’s & RE’s • An example of Case (3) in the last page (supplemental) (cont’d) • Use the result R11 + Q1S*P1 = X + YV*Z as R in Fig. 3.10 like the following: • And the final result is R* = (X + YV*Z)*. • This will be used in your homework.

  44. 0, 1 1 0 start 1 U R 2 S q2 q1 start T 3.2 FA’s & RE’s • Example 3.5 revisited • Use the derivation for 2-state automaton described previously directly to be (R+SU*T)*SU* = (1 + 0(0 + 1)*f)*0(0 + 1)* = 1* 0(0 + 1)*correct!

  45. 0, 1 1 0, 1 0, 1 start A 0 + 1 B C D C B D 1 0 + 1 0 + 1 start A 3.2 FA’s & RE’s • Example 3.6 • Step 1: regard all symbols on the arcs as RE’s, we get

  46. 0 + 1 0 + 1 C B D R11 R11+ Q1S *P1 q1 q2 q2 q1 S . . . . . . . . . Q1 P1 . . . s 3.2 FA’s & RE’s • Example 3.6 • Step 2: to remove B, use the following conversion we get s = B, q1 = A, q2 = C, S = f, Q1 = 1, P1 = 0 + 1, R11 = f, so R11 + Q1S*P1 = f + 1f*(0 + 1) = 1e(0 + 1) = 1(0 + 1) 0 + 1 1 start A

  47. 0 + 1 0 + 1 1(0 + 1) start A D C R11 R11+ Q1S *P1 q1 q2 q2 q1 S . . . . . . . . . Q1 P1 . . . s 3.2 FA’s & RE’s • Example 3.6 (cont’d) • For final state D, we have to remove C further, resulting in s = C, q1 = A, q2 = D, S = f, Q1 =1(0 + 1), P1 =0 + 1, R11= f, so R11 + Q1S*P1 = f + 1(0 + 1)f*(0 + 1) = 1(0 + 1)(0 + 1)

  48. 0 + 1 1(0 + 1)(0 + 1) start A U R D S q2 q1 start T 3.2 FA’s & RE’s • Example 3.6 (cont’d) • By the following conversion, we get • R = (0 + 1), q1 =A, q2 =D, S = 1(0 + 1)(0 + 1),T = f, U= f so (R+SU*T)*SU*= (0+1+1(0 + 1)f*f)*(1(0 + 1)(0 + 1)) f* = (0 + 1)*1(0 + 1)(0 + 1)

  49. 0 + 1 0 + 1 1(0 + 1) start A C D R11 R11+ Q1S *P1 q1 q2 q2 q1 S . . . . . . . . . Q1 P1 . . . s 3.2 FA’s & RE’s • Example 3.6 (cont’d) • For the other final state C, starting from the following diagram We have to eliminate D by the following diagram

  50. 0 + 1 U R C 1(0 + 1) start A S q2 q1 start T 3.2 FA’s & RE’s • Example 3.6 (cont’d) • Since D has no successor (and C before it is a final state), deleting D has no effect to the other parts, resulting in the following diagram. And by the following conversion, we get (R+SU*T)*SU*= (0 + 1 + 1(0 + 1)f*f)*(1(0 + 1)) f* = (0 + 1)*1(0 + 1)

More Related