- 66 Views
- Uploaded on
- Presentation posted in: General

Regular Expressions

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Midterm exam 10-8-14

Review for Midterm 10-6-14

Regular Expressions

Definitions

Equivalence to Finite Automata

- Regular expressions are an algebraic way to describe regular languages.
- If RE is a regular expression, then L(RE) denotes the language it defines.
- RE’s and their languages are defined recursively.

- Recursive description of languages derived from RE’s involves 3 basic operations between languages: union, concatenation, and closure.
- Union of L and M is set of all strings either in L or in M or in both
- {001,10,111}U{e,001}={e,10,001,111}

- Concatenation of L and M is sometimes denoted by “dot” (L.M)
- Most often denoted by simply LM
- LM is the set of all string that can be formed by concatenating any string in L with any string in M
- {001,10,111}.{e,001}=
- {001,10,111,001001,10001,111001}
- Note: left-right order preserved

- Closure (denoted L*) is set of strings obtained by taking any number of strings from L, possibly with repeats, and concatenating all of them.
- L* = Uk>0 Lk
- Union of all powers of L (including zero)

- For all languages, L* contains {e}
- Why?

- L1 = L
- Lk (k>1) concatenation of k copies of L
- If L={0,11}, L2 = {0,11}{0,11} ={00,011,110,1111}
- L(∅) is the empty language (no strings)
- L(∅)*={e} rare example of finite closure

- Like all algebras, RE’s are made up of constants and variables connected by operators.
- Parentheses used to group terms

- Basis 1: any symbol, a, is a RE. L(RE)={a}
- L(RE)is language containing one string of length 1.
- Generalizable to strings of any length

- Basis 2: ε is a RE, and L(RE) = {ε}
- L(RE) consists of empty string only

- Basis 3: ∅ is a RE, and L(RE) = ∅
- L(RE) has no strings

Concatenation

- Induction 1: If E1 and E2 are RE’s, thenE1+E2 is a RE, and L(E1+E2) = L(E1)L(E2)
- Induction 2: If E1 and E2 are RE’s then E1E2 is a RE, and L(E1E2) = L(E1)L(E2).

Closure, or “Kleene closure”

named for originator of * operation

- Induction 3: If E is a RE, then E* is a RE, and L(E*) = (L(E))* or simply L(E)*

- Parentheses used as needed to influence the grouping of operators.
- If E is a RE then (E) is a RE defining the same language as E; L((E))=L(E)
- Order of precedence is * (highest), then concatenation, then + (lowest).

- L(01) = {01}.
- L(01+0) = {01, 0}.
- L(0(1+0)) = {0}{0,1}={00, 01}.
- Note order of precedence of operators.

- L(0*) = {ε, 0, 00, 000,… }.
- L(01*) = all strings consisting of a 0 followed by any number of 1’s
- L((01)*) = all strings consisting of zero or more occurrences of 01

- L((01)*) is case that begins with 0 and ends with 1.
- 3 other cases: L((10)*), L(0(10)*), and L(1(01)*)
- L is the union the 4 cases
- L = L(RE)
- Where RE = (01)*+(10)*+0(10)*+1(01)*

- Concatenation method: (e+1)(01)*(e+0)
- Distributive law gives the 4 cases
- (01)*(e+0)=((01)*+ (01)*0)
- (e+1)((01)*+ (01)*0)=
- (01)*+(01)*0 + 1(01)*+ 1(01)*0

- (01)* begins 0 ends 1
- (01)*0 begins 0 ends 0
- 1(01)* begins 1 ends 1
- 1(01)*0 begins 1 ends 0

- * (highest) operates on smallest sequence to symbols to its left that is legal RE
- Example: 01* closure on 1 only

- After grouping all *’s to their operands, group all concatenations to their operands (0 to 1* in example)
- Finally, group unions (+) with operands; (as in 1+01*)

- Concatenation is associative.
- 0(12) = (01)2

- Union is associative.
- (a+b)+c = a+(b+c)

- E=01*+1=(0(1*))+1: L(E)={1} plus all strings with 0 followed by any number 1’s
- E=(01)*+1: L(E)={1} plus all string repeating 01 zero or more times
- E=0(1*+1): L(E)=all string beginning with 0 followed by any number of 1’s
- Note: 1* and (1*+1) are the same

- Will show that for every RE, there is an FA that defines the same language.
- Sufficient to show for ε-NFA’s.

- Will show that for every FA, there is a RE that defines the same language.
- Sufficient to show for DFA’s.

- Rename the states of the DFA to be 1,2,…,n.
- construct RE’s from the labels of a restricted sets of paths called k-paths.

- k-path is a path between specified states that goes though no state numbered higher than k.
- Endpoints of k-paths are not restricted; they can be any pair of states or the same state (i.e. a loop)

1

1

2

0

0

0

1

1

3

- 0-paths from 2 to 3
- no intermediates
- RE from labels (only one in this case) = 0.

- 1-paths from 2 to 3
- direct and around outside
- RE for labels = 0+11

1

1

2

0

0

0

1

1

3

- 2-paths from 2 to 3:
- RE from labels = (10)*0+1(01)*1
- (10)* and (01)* allow for zero or more loops through 1 before going to 3

- 3-paths from 2 to 3:
- no restrictions, k=n

1

1

2

0

0

0

1

1

3

- Let Rijk be the RE from the set of labels of k-paths from state i to state j.
- Basis: k=0 Rij0 = sum of labels on arcs from i to j; ∅ if no such arc; add ε if i=j
- Examples:
- R110 = ∅ + ε = ε
- R120 = 0
- R130 = 1
- R210 = 1

Goes from

i to k the

first time

Then, from

k to j

Doesn’t go

through k

Zero or

more times

from k to k

- A k-path from i to j either:
- Never goes through state k, or
- Goes through k one or more times.
Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1.

Path to k

Paths not going

through k

From k to k

Several times

From k

to j

i

j

k

States < k

- Rijn is the RE with the same language as the DFA where:
- n is the number of states in the DFA
- i is the start state.
- j is one of the final states.

1

1

2

0

0

0

1

1

3

- R233 = R232 + R232(R332)*R332 = R232(R332)*
- R232 = (10)*0+1(01)*1 (see slide 21)
- R332 = 0(01)*(1+00) + 1(10)*(0+11)
- R233 = [(10)*0+1(01)*1][(0(01)*(1+00) + 1(10)*(0+11))]*

Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1

- Identity union: E + ∅ = E
- Annihilator concatenation: ∅E=E∅=∅
- E is any RE

We have shown by construction that a RE for any DFA exist that defines the same language that the DFA accepts.

The method always works but may be time consuming since about n3 RE’s must be constructed for an n-state DFA.

An alternate method “eliminating states”

28

Basic principle: After state s is eliminated, RE’s on the residual arcs must define a transition function that supports the same language as before.

Usually this requirement can be satisfied by considering the states qi that are precursors to s and states pj that are successors to s

29

Let Qi be RE for labels on arc from predecessor qi to eliminated state s

Let Pj be RE for labels on arc from eliminated state s to successor pj

Let S be RE for labels on a loop on s

Let Rij be RE for labels on existing direct path between qi and pj.

Then the RE for path between qi and pj without s is Rij +QiS* Pj.

Some parts may not be present

30

Example from exercise 3.2.1 page 107

1

0

0

0

q1

q2

q3

- 4 sets of predecessor-successor combinations involving state q2
- Arcs q1 to q2 and q2 to q3 both labeled 0
- Arcs q3 to q2 and q2 to q1 both labeled 1
- Arcs q1 to q2 labeled 0 and q2 to q1 labeled 1
- Arcs q3 to q2 labeled 1 and q2 to q3 labeled 0

- Let Pj be RE for labels on arc from eliminated state s to successor pj
- Let S be RE for labels on loop on s
- Let Rij be RE for labels on existing direct path between qi and pj.
- Then the RE for path between qi and pj without s is Rij +QiS* Pj.
- Some parts may be ∅

1

1

31

All 4 cases, standard form reduces to QiPj

No loop on q2 and no direct q1 to q3

In 2 cases, pj=qi so arcs become loops

1

0

0

0

q1

q2

q3

1

1

0+10

1+01

00

q1

q3

11

32

To find the RE that is equivalent to the DFA, continue state elimination until only “start” and accepting states {qk} remain

Let L(REk) be the language of strings accepted by qk

The RE equivalent to DFA is sum over k of REk (union of all L(REk))

33

For each accepting state qk, the state-elimination process will result in a generic one-state (if q0=qk ) or two-state automaton

Generic two-state

Generic one-state

R

R

U

S

1

REk = R*

1

2

REk =(R+SU*T)*SU*

T

Actual values of R,S,T, and U are problem specific and some may be ∅

R=1+01, S=00, T=11, U=0+10

RE=[(1+01)+00(0+10)*11]*00(0+10)*

Standard form applied to exercise 3.2.4(e)

1

0

REk =(R+SU*T)*SU*

0

0

q1

q2

q3

R

U

S

1

1

1

2

0+10

1+01

T

00

q1

q3

11

35

To complete proof of equivalence, we show by construction that for every RE, there is an automaton that accepts the same language that the RE defines.

It is sufficient to construct a e-NFA type with the following restriction:

One accepting state

No arcs into “start” state

No arcs out of accepting state

36

- Formal statement: if L(RE) is a language defined by RE, then there exist an ε-NFA, denoted by eRE, such that L(eRE)=L(RE)
- Proof is by constructive induction on the number of operators (+, concatenation, *) in the RE.
- Basis: For L(RE)={a} and {e}, eRE consist of single arc between “start” and accepting states labeled by a and e, respectively
- Same for L(RE)=∅ except no arc

e-NFA for E1

e-NFA for E2

- Use these e-NFA’s to build eRE such that L(eRE)=L(RE)
- Sufficient to show how these e-NFA’s are used to build e-NFA’s for E1+E2, E1E2, and E1*
- eRE is built by linking these intermediate e-NFA’s as ordered by operations in RE

For E1

For E2

ε

ε

ε

ε

For E1 E2

For E1

For E2

ε

For E1E2

ε

For E1

ε

ε

ε

For E1*

Review

- Regular expressions (RE) and finite automata (DFA, NFA, ε-NFA) are equivalent in their ability to define “regular languages”
- Proof of equivalence involves construction
- Some constructions are trivial (DFA->NFA and NFA->e-NFA)
- RE<-->FA not trivial in either direction
- DFA->RE most challenging
- 2 methods: K-paths and elimination of states

K-paths method: relate k to k-1

k-path from i to j either:

Never goes through state k, or

Goes through k one or more times.

Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1

Construction: for i=start and j=accepting, build

up Rijk k=0…n, where n=number of states of the DFA

Reduce to start and accepting state k

Substitute arc RE’s into generic form

REk =(R+SU*T)*SU*

Repeat for all accepting states

Form union of all L(REk))

DFA to RE by Eliminating States

1

0

0

0

q1

q2

q3

1

1

0+10

1+01

00

q1

q3

11

44

Commutative law of union: L+M=M+L

Associative law of union: (L+M)+N=L+(M+N)

Idempotence of union: L+L=LUL=L

Associative law of concatenation: (LM)N=L(MN)

Concatenation does not commute: LM not equal to ML

45

- concatenation distributes over union but with restrictions because cat is not commutative
- Left distributive law: L(M+N)=LM+LN
- Right distributive law: (M+N)L=ML+NL

- Identities and annihilators
- R+∅ = R
- εR = Rε = R
- ∅R = R∅ = ∅

- Laws on closure
- (L*)*=L*
- ∅*=e
- e*=e

- (L*)*=L* for any regular language L
- More obvious from simple example
- (a*)*=a*

- Most useful in disproving laws
- If test on example is false, then law cannot be true in general

- Example of testing in text p121
- L+ML ?= (L+M)L
- Try L=a, M=b
- {a}+{b}{a} ?= ({a}+{b}){a}
- {a}U{ba} ?= {aa}U{ba}
- Not true: left side has no {aa}

CptS 317 Fall 2014

Assignment 6, Due 10-17-14

Exercise 3.2.1 (a) and (b), Text p107

Exercise 3.2.4 (a) Text p108

Show all steps