Regular Expressions

Midterm exam 10-8-14 Review for Midterm 10-6-14 Regular Expressions Definitions Equivalence to Finite Automata

RE’s: Introduction • Regular expressions are an algebraic way to describe regular languages. • If RE is a regular expression, then L(RE) denotes the language it defines. • RE’s and their languages are defined recursively.

Introduction 2 • Recursive description of languages derived from RE’s involves 3 basic operations between languages: union, concatenation, and closure. • Union of L and M is set of all strings either in L or in M or in both • {001,10,111}U{e,001}={e,10,001,111}

Introduction 3 • Concatenation of L and M is sometimes denoted by “dot” (L.M) • Most often denoted by simply LM • LM is the set of all string that can be formed by concatenating any string in L with any string in M • {001,10,111}.{e,001}= • {001,10,111,001001,10001,111001} • Note: left-right order preserved

Introduction 4 • Closure (denoted L*) is set of strings obtained by taking any number of strings from L, possibly with repeats, and concatenating all of them. • L* = Uk>0 Lk • Union of all powers of L (including zero) • For all languages, L* contains {e} • Why?

Introduction 5 • L1 = L • Lk (k>1) concatenation of k copies of L • If L={0,11}, L2 = {0,11}{0,11} ={00,011,110,1111} • L(∅) is the empty language (no strings) • L(∅)*={e} rare example of finite closure

Building regular expressions • Like all algebras, RE’s are made up of constants and variables connected by operators. • Parentheses used to group terms

Elementary components of RE’s • Basis 1: any symbol, a, is a RE. L(RE)={a} • L(RE)is language containing one string of length 1. • Generalizable to strings of any length • Basis 2: ε is a RE, and L(RE) = {ε} • L(RE) consists of empty string only • Basis 3: ∅ is a RE, and L(RE) = ∅ • L(RE) has no strings

Concatenation Recursive Definitions of RE’s • Induction 1: If E1 and E2 are RE’s, thenE1+E2 is a RE, and L(E1+E2) = L(E1)L(E2) • Induction 2: If E1 and E2 are RE’s then E1E2 is a RE, and L(E1E2) = L(E1)L(E2).

Closure, or “Kleene closure” named for originator of * operation Recursive Definition of RE 2 • Induction 3: If E is a RE, then E* is a RE, and L(E*) = (L(E))* or simply L(E)*

Precedence of Operators • Parentheses used as needed to influence the grouping of operators. • If E is a RE then (E) is a RE defining the same language as E; L((E))=L(E) • Order of precedence is * (highest), then concatenation, then + (lowest).

Examples: RE’s and L(RE) • L(01) = {01}. • L(01+0) = {01, 0}. • L(0(1+0)) = {0}{0,1}={00, 01}. • Note order of precedence of operators. • L(0*) = {ε, 0, 00, 000,… }. • L(01*) = all strings consisting of a 0 followed by any number of 1’s • L((01)*) = all strings consisting of zero or more occurrences of 01

L=all strings of alternating 0’s and 1’s • L((01)*) is case that begins with 0 and ends with 1. • 3 other cases: L((10)*), L(0(10)*), and L(1(01)*) • L is the union the 4 cases • L = L(RE) • Where RE = (01)*+(10)*+0(10)*+1(01)*

L=all strings of alternating 0’s and 1’s • Concatenation method: (e+1)(01)*(e+0) • Distributive law gives the 4 cases • (01)*(e+0)=((01)*+ (01)*0) • (e+1)((01)*+ (01)*0)= • (01)*+(01)*0 + 1(01)*+ 1(01)*0 • (01)* begins 0 ends 1 • (01)*0 begins 0 ends 0 • 1(01)* begins 1 ends 1 • 1(01)*0 begins 1 ends 0

Application of precedence • * (highest) operates on smallest sequence to symbols to its left that is legal RE • Example: 01* closure on 1 only • After grouping all *’s to their operands, group all concatenations to their operands (0 to 1* in example) • Finally, group unions (+) with operands; (as in 1+01*)

Associative laws • Concatenation is associative. • 0(12) = (01)2 • Union is associative. • (a+b)+c = a+(b+c)

Examples: • E=01*+1=(0(1*))+1: L(E)={1} plus all strings with 0 followed by any number 1’s • E=(01)*+1: L(E)={1} plus all string repeating 01 zero or more times • E=0(1*+1): L(E)=all string beginning with 0 followed by any number of 1’s • Note: 1* and (1*+1) are the same

Equivalence of RE’s and FA’s • Will show that for every RE, there is an FA that defines the same language. • Sufficient to show for ε-NFA’s. • Will show that for every FA, there is a RE that defines the same language. • Sufficient to show for DFA’s.

DFA-to-RE • Rename the states of the DFA to be 1,2,…,n. • construct RE’s from the labels of a restricted sets of paths called k-paths. • k-path is a path between specified states that goes though no state numbered higher than k. • Endpoints of k-paths are not restricted; they can be any pair of states or the same state (i.e. a loop)

1 1 2 0 0 0 1 1 3 Example: k-Paths • 0-paths from 2 to 3 • no intermediates • RE from labels (only one in this case) = 0. • 1-paths from 2 to 3 • direct and around outside • RE for labels = 0+11

1 1 2 0 0 0 1 1 3 Example: k-Paths • 2-paths from 2 to 3: • RE from labels = (10)*0+1(01)*1 • (10)* and (01)* allow for zero or more loops through 1 before going to 3 • 3-paths from 2 to 3: • no restrictions, k=n

1 1 2 0 0 0 1 1 3 Formal development: DFA to RE • Let Rijk be the RE from the set of labels of k-paths from state i to state j. • Basis: k=0 Rij0 = sum of labels on arcs from i to j; ∅ if no such arc; add ε if i=j • Examples: • R110 = ∅ + ε = ε • R120 = 0 • R130 = 1 • R210 = 1

Goes from i to k the first time Then, from k to j Doesn’t go through k Zero or more times from k to k Induction: relate k to k-1 • A k-path from i to j either: • Never goes through state k, or • Goes through k one or more times. Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1.

Path to k Paths not going through k From k to k Several times From k to j Illustration of Induction i j k States < k

Final Step • Rijn is the RE with the same language as the DFA where: • n is the number of states in the DFA • i is the start state. • j is one of the final states.

1 1 2 0 0 0 1 1 3 Example of formalismstart=2, accept=3, n=3 • R233 = R232 + R232(R332)*R332 = R232(R332)* • R232 = (10)*0+1(01)*1 (see slide 21) • R332 = 0(01)*(1+00) + 1(10)*(0+11) • R233 = [(10)*0+1(01)*1][(0(01)*(1+00) + 1(10)*(0+11))]* Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1

Useful RE’s in evaluation of Rijk • Identity union: E + ∅ = E • Annihilator concatenation: ∅E=E∅=∅ • E is any RE

Equivalence of RE’s and FA’s We have shown by construction that a RE for any DFA exist that defines the same language that the DFA accepts. The method always works but may be time consuming since about n3 RE’s must be constructed for an n-state DFA. An alternate method “eliminating states” 28

DFA to RE by Eliminating States Basic principle: After state s is eliminated, RE’s on the residual arcs must define a transition function that supports the same language as before. Usually this requirement can be satisfied by considering the states qi that are precursors to s and states pj that are successors to s 29

DFA to RE by Eliminating States 2 Let Qi be RE for labels on arc from predecessor qi to eliminated state s Let Pj be RE for labels on arc from eliminated state s to successor pj Let S be RE for labels on a loop on s Let Rij be RE for labels on existing direct path between qi and pj. Then the RE for path between qi and pj without s is Rij +QiS* Pj. Some parts may not be present 30

DFA to RE by Eliminating States 3 Example from exercise 3.2.1 page 107 1 0 0 0 q1 q2 q3 • 4 sets of predecessor-successor combinations involving state q2 • Arcs q1 to q2 and q2 to q3 both labeled 0 • Arcs q3 to q2 and q2 to q1 both labeled 1 • Arcs q1 to q2 labeled 0 and q2 to q1 labeled 1 • Arcs q3 to q2 labeled 1 and q2 to q3 labeled 0 • Let Pj be RE for labels on arc from eliminated state s to successor pj • Let S be RE for labels on loop on s • Let Rij be RE for labels on existing direct path between qi and pj. • Then the RE for path between qi and pj without s is Rij +QiS* Pj. • Some parts may be ∅ 1 1 31

DFA to RE by Eliminating States 4 All 4 cases, standard form reduces to QiPj No loop on q2 and no direct q1 to q3 In 2 cases, pj=qi so arcs become loops 1 0 0 0 q1 q2 q3 1 1 0+10 1+01 00 q1 q3 11 32

DFA to RE by Eliminating States 5 To find the RE that is equivalent to the DFA, continue state elimination until only “start” and accepting states {qk} remain Let L(REk) be the language of strings accepted by qk The RE equivalent to DFA is sum over k of REk (union of all L(REk)) 33

For each accepting state qk, the state-elimination process will result in a generic one-state (if q0=qk ) or two-state automaton Generic two-state Generic one-state R R U S 1 REk = R* 1 2 REk =(R+SU*T)*SU* T Actual values of R,S,T, and U are problem specific and some may be ∅

DFA to RE by Eliminating States 7 R=1+01, S=00, T=11, U=0+10 RE=[(1+01)+00(0+10)*11]*00(0+10)* Standard form applied to exercise 3.2.4(e) 1 0 REk =(R+SU*T)*SU* 0 0 q1 q2 q3 R U S 1 1 1 2 0+10 1+01 T 00 q1 q3 11 35

Equivalence of RE’s and Automata To complete proof of equivalence, we show by construction that for every RE, there is an automaton that accepts the same language that the RE defines. It is sufficient to construct a e-NFA type with the following restriction: One accepting state No arcs into “start” state No arcs out of accepting state 36

Converting a RE to an ε-NFA • Formal statement: if L(RE) is a language defined by RE, then there exist an ε-NFA, denoted by eRE, such that L(eRE)=L(RE) • Proof is by constructive induction on the number of operators (+, concatenation, *) in the RE. • Basis: For L(RE)={a} and {e}, eRE consist of single arc between “start” and accepting states labeled by a and e, respectively • Same for L(RE)=∅ except no arc

e-NFA for E1 e-NFA for E2 (IH):assume theorem true for subexpressions E1 and E2 in RE • Use these e-NFA’s to build eRE such that L(eRE)=L(RE) • Sufficient to show how these e-NFA’s are used to build e-NFA’s for E1+E2, E1E2, and E1* • eRE is built by linking these intermediate e-NFA’s as ordered by operations in RE

For E1 For E2 ε ε ε ε For E1 E2 RE to ε-NFA: Induction 1 – Union

For E1 For E2 ε For E1E2 RE to ε-NFA: Induction 2 – Concatenation

ε For E1 ε ε ε For E1* RE to ε-NFA: Induction 3 – Closure

Review • Regular expressions (RE) and finite automata (DFA, NFA, ε-NFA) are equivalent in their ability to define “regular languages” • Proof of equivalence involves construction • Some constructions are trivial (DFA->NFA and NFA->e-NFA) • RE<-->FA not trivial in either direction • DFA->RE most challenging • 2 methods: K-paths and elimination of states

K-paths method: relate k to k-1 k-path from i to j either: Never goes through state k, or Goes through k one or more times. Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1 Construction: for i=start and j=accepting, build up Rijk k=0…n, where n=number of states of the DFA

Reduce to start and accepting state k Substitute arc RE’s into generic form REk =(R+SU*T)*SU* Repeat for all accepting states Form union of all L(REk)) DFA to RE by Eliminating States 1 0 0 0 q1 q2 q3 1 1 0+10 1+01 00 q1 q3 11 44

Algebraic Laws of RE’s Commutative law of union: L+M=M+L Associative law of union: (L+M)+N=L+(M+N) Idempotence of union: L+L=LUL=L Associative law of concatenation: (LM)N=L(MN) Concatenation does not commute: LM not equal to ML  45

Algebraic Laws for RE’s 2 • concatenation distributes over union but with restrictions because cat is not commutative • Left distributive law: L(M+N)=LM+LN • Right distributive law: (M+N)L=ML+NL • Identities and annihilators • R+∅ = R • εR = Rε = R • ∅R = R∅ = ∅

Algebraic Laws for RE’s 3 • Laws on closure • (L*)*=L* • ∅*=e • e*=e

Testing algebraic laws by simple examples • (L*)*=L* for any regular language L • More obvious from simple example • (a*)*=a* • Most useful in disproving laws • If test on example is false, then law cannot be true in general

Testing Algebraic Laws - 2 • Example of testing in text p121 • L+ML ?= (L+M)L • Try L=a, M=b • {a}+{b}{a} ?= ({a}+{b}){a} • {a}U{ba} ?= {aa}U{ba} • Not true: left side has no {aa}

CptS 317 Fall 2014 Assignment 6, Due 10-17-14 Exercise 3.2.1 (a) and (b), Text p107 Exercise 3.2.4 (a) Text p108 Show all steps

Regular Expressions