1 / 141

Deterministic Finite Automata Machine - PowerPoint PPT Presentation

Deterministic Finite Automata Machine. More Examples of DFAs Simple examples 0*1* vs 0 n 1 n Mod Counting Examples Contains Substring (NFA vs DFA) Any Finite Language Integer mod 7 Adding to Integers Calculator Syntax Processor. Read-Once and Bounded Memory Loop Invariants

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Deterministic Finite Automata Machine

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Deterministic Finite AutomataMachine

More Examples of DFAs

Simple examples

0*1* vs 0n1n

Mod Counting Examples

Contains Substring (NFA vs DFA)

Any Finite Language

Integer mod 7

Calculator

Syntax Processor

Loop Invariants

Constant Memory

Code with Simple Loop

Deterministic Finite Automata

Path Through Graph

DFA "knowing" and distinguishable strings

Nondeterministic Finite Automata

DFA vs TM

(Extended) Regular Expressions

Parsing with Regular Expressions

Jeff Edmonds

York University

ECS 2001

Lecture 2

• The input arrives as a stream.One character at a time.It cannot be reread.

• Eg:

• simple iterative algorithms

• simple mechanical or electronic devices like elevators and calculators

• simple processes like the job queue of an operating system

• simple patterns within strings of characters.

• simple languages of strings

{01,100,01110,… }

l=40,

x=3,

y=2

• The input arrives as a stream.One character at a time.It cannot be reread.

• What do you have to rememberso that in the end you can answer the question?

• The memory is bounded!

Loop Invariant:

Many different equivalent models:

• Loop Invariant

• What is remembered at top of loop

• Code with simple loop

• Deterministic Finite Automata (DFA)

• Focuses on transitions between states

• Path through DFA

• Each path represents a string

• Nondeterministic Finite Automata (NFA)

• Get help from a fairy god mother.

• Regular Expressions

• Representing string patterns

• Extended Regular Expressions

• Also allow set intersection and set complement

Compilably Equivalent

(a*b  a*bb) 0*1*

(a*b  a*bb) 0*1*

Many different equivalent models:

• Loop Invariant

• What is remembered at top of loop

• Code with simple loop

• Deterministic Finite Automata (DFA)

• Focuses on transitions between states

• Path through DFA

• Each path represents a string

• Nondeterministic Finite Automata (NFA)

• Get help from a fairy god mother.

• Regular Expressions

• Representing string patterns

• Extended Regular Expressions

• Also allow set intersection and set complement

Compilably Equivalent

(a*b  a*bb) 0*1*

(a*b  a*bb) 0*1*

Given a string, find the longest block of 1’s

00101111001100011111000011001

Alg reads the digits one at a time and remembers enough about what has been read so that it does not need to reread anything.

• Largest block so far.

Given a string, find the longest block of 1’s

00101111001100011

1

1

1

When it has read this much,what does it remember?

Read the next character & re-determine the largest block so far.

Read the next character & re-determine the largest block so far& current largest.

Given a string, find the longest block of 1’s

00101111001100011

1

1

1

When it has read this much,what does it remember?

• Largest block so far.

• Size of current block.

A loop invariant is an assertion that must be true about the state of the data structure every time the algorithm is at the top of the loop.

Solution

Extra

The input consists of an array of objects

We have read in the first i objects.

We will pretend that this prefix is the entire input.

We have a solution for this prefix

Solution

Extra

The input consists of an array of objects

• Do not worry about the entire computation.

next

Solution

Solution

Extra

Extra

The input consists of an array of objects

We read in the i+1st object.

We will pretend that this larger prefix is the entire input.

We extend the solution we have to one for this larger prefix.

• By Induction the computation will always keep the loop invariant true!

Solution

Exit

The input consists of an array of objects

In the end, we have read in the entire input.

The LI gives us that we have a solution for this entire input.

Exit

Exit

Exit

0 km

Exit

79 km

75 km

79 km

to school

Exit

Loop Invariant

52,23,88,31,25,30,98,62,14,79

23,31,52,88

Insertion Sort

The input consists of an array of integers

Solution

We have read in the first i objects.

We will pretend that this prefix is the entire input.

We have a solution for this prefix

23,25,31,52,88

52,23,88,31,25,30,98,62,14,79

23,31,52,88

Insertion Sort

The input consists of an array of integers

We read in the i+1st object.

We will pretend that this larger prefix is the entire input.

We extend the solution we have to one for this larger prefix.

If you are describing an alg to a friend, is this what you say?

Do you think about an algorithm as a sequence of actions?

Do you explain it by saying:“Do this. Do that. Do this”?

Do you get lost about where the computation is and where it is going?

What if there are many paths through many ifs and loops?

How do you know it worksfor every pathon every input?

A Sequence of Actions

Max( a,b,c )

m = a

if( b>m ) m = bendif

if( c>m ) m = cendif

return(m)

“preCond: Input has 3 numbers.”

“postCond: return max in {a,b,c}”

A Sequence of Actions

Max( a,b,c )

m = a

At least tell me what the algorithm is supposed to do.

• Preconditions:Any assumptions that must be true about the input instance.

• Postconditions:The statement of what must be true when the algorithm/program returns.

if( b>m ) m = bendif

if( c>m ) m = cendif

return(m)

“preCond: Input has 3 numbers.”

“postCond: return max in {a,b,c}”

A Sequence of Actions

Max( a,b,c )

m = a

How can you possibly understand this algorithm without knowing what is true when the computation is here?

if( b>m ) m = bendif

if( c>m ) m = cendif

return(m)

“preCond: Input has 3 numbers.”

“postCond: return max in {a,b,c}”

A Sequence of Actions

Max( a,b,c )

m = a

“assert: m is max in {a}”

How can you possibly understand this algorithm without knowing what is true when the computation is here?

if( b>m ) m = bendif

“assert: m is max in {a,b}”

if( c>m ) m = cendif

Tell me!

“assert: m is max in {a,b,c}”

return(m)

vsA Sequence of Assertions

Max( a,b,c )

m = a

if( b>m ) m = bendif

if( c>m ) m = cendif

return(m)

A Sequence of Actions

“preCond: Input has 3 numbers.”

“assert: m is max in {a}”

“assert: m is max in {a,b}”

It is helpful to have different ways of looking at it.

“assert: m is max in {a,b,c}”

“postCond: return max in {a,b,c}”

Fixed/Constant vs Arbitrary/Finite

l=40,

x=3,

y=2

• The input arrives as a stream.One character at a time.It cannot be reread.

• What do you have to rememberso that in the end you can answer the question?

• The memory is bounded!

Loop Invariant:

Fixed/Constant vs Arbitrary/Finite

Given a string, find the longest block of 1’s

l=40,

x=3,

y=2

00101111001100011

1

When it has read this much,what does it remember?

• Largest block so far.

• Size of current block.

• How much memory is need on an input of length n?

• A count up to n, takes log n bits of memory.

• Too much. Memory can’t grow with input!

Fixed/Constant vs Arbitrary/Finite

Max( a,b,c )

m = a

if( b>m ) m = bendif

if( c>m ) m = cendif

return(m)

“preCond: Input has 3 numbers.”

“assert: m is max in {a}”

“assert: m is max in {a,b}”

• How much memory is need on an input of length n?

• A index up to n, takes log n bits of memory.

• Too much. Memory can’t grow with input!

“assert: m is max in {a,b,c}”

“postCond: return max in {a,b,c}”

Fixed/Constant vs Arbitrary/Finite

• Given the needs of the problem at hand,a programmer can give her constant memory program

• as many variables and

• as a big of a (finite) range for each variable

• as she wants.

• But these numbers are fixed/constant.

• If K(J,I) is these numbers program J has on input I,

• then this function can depend on J,

• but can’t depend on I

• (or on |I|)

• "regular language L,

•  constant memory program J,  an integer k,

• "inputs I,

• K(J,I) k

Fixed/Constant vs Arbitrary/Finite

I come up with a regular language L

• "regular language L,

•  constant memory program J,  an integer k,

• "inputs I,

• K(J,I) k

Fixed/Constant vs Arbitrary/Finite

I write a constant memory program J and give it k=1,000,000,000,000,000 variables, each with range 0..1,000,000,000,000,000!

• "regular language L,

•  constant memory program J,  an integer k,

• "inputs I,

• K(J,I) k

Fixed/Constant vs Arbitrary/Finite

I write a constant memory program J and give it k=1,000,000,000,000,000 variables, each with range 0..1,000,000,000,000,000!

Wow. That’s not fair.With more memory, it can memorize more of the input and then can have any power to determine the answer.

• "regular language L,

•  constant memory program J,  an integer k,

• "inputs I,

• K(J,I) k

Fixed/Constant vs Arbitrary/Finite

I write a constant memory program J and give it k=1,000,000,000,000,000 variables, each with range 0..1,000,000,000,000,000!

I will let him use any amount of memory,but this fixed J must work for all inputs.

• "regular language L,

•  constant memory program J,  an integer k,

• "inputs I,

• K(J,I) k

Fixed/Constant vs Arbitrary/Finite

I write a constant memory program J and give it k=1,000,000,000,000,000 variables, each with range 0..1,000,000,000,000,000!

If he uses more memory,I will give him a bigger input I.

Hee Hee Hee

J must still solve the problem.

• "regular language L,

•  constant memory program J,  an integer k,

• "inputs I,

• K(J,I) k

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

a = 001011

Length = |a|= 6 > 3

Therefore not in language L.

a = 011

# of 1’s = 2 is even

Therefore not in language L.

a = 001

Length = |a|= 3  3

# of 1’s = 1 is odd

Therefore in language L.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

l =length = |a|  {0,1,2,3,more}

# of 1’s is r  {even,odd}

a = 001

When it has read this much,what does it remember?

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

l =length = |a|  {0,1,2,3,more}

# of 1’s is r  {even,odd}

l =0, r = even

a = 

With the empty string read, establish the empty string.

Read the next character & re-determine the largest block so far& current largest.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

l =length = |a|  {0,1,2,3,more}

# of 1’s is r  {even,odd}

l =0, r = even

a = 

lt= lt-1+1 = 1, rt = rt-1 = even

a = 0

Read the next character & re-determine the largest block so far& current largest.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

l =length = |a|  {0,1,2,3,more}

# of 1’s is r  {even,odd}

l =0, r = even

a = 

lt= lt-1+1 = 1, rt = rt-1 = even

a = 0

lt= lt-1+1 = 2, rt = rt-1 +1 = odd

a = 01

Read the next character & re-determine the largest block so far& current largest.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

l =length = |a|  {0,1,2,3,more}

# of 1’s is r  {even,odd}

l =0, r = even

a = 

lt= lt-1+1 = 1, rt = rt-1 = even

a = 0

lt= lt-1+1 = 2, rt = rt-1 +1 = odd

a = 01

lt= lt-1+1 = 3, rt = rt-1 = odd

a = 010

Read the next character & re-determine the largest block so far& current largest.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

l =length = |a|  {0,1,2,3,more}

# of 1’s is r  {even,odd}

l =0, r = even

a = 

lt= lt-1+1 = 1, rt = rt-1 = even

a = 0

lt= lt-1+1 = 2, rt = rt-1 +1 = odd

a = 01

lt= lt-1+1 = 3, rt = rt-1 = odd

a = 010

lt= lt-1+1 = more, rt = rt-1 +1 = even

a = 0101

Exit

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

l =length = |a|  {0,1,2,3,more}

# of 1’s is r  {even,odd}

l =0, r = even

a = 

lt= lt-1+1 = 1, rt = rt-1 = even

a = 0

lt= lt-1+1 = 2, rt = rt-1 +1 = odd

a = 01

lt= lt-1+1 = 3, rt = rt-1 = odd

a = 010

lt= lt-1+1 = more, rt = rt-1 +1 = even

a = 0101

Once the string is all read in,determine if in L or not.

a  L

Many different equivalent models:

• Loop Invariant

• What is remembered at top of loop

• Code with simple loop

• Deterministic Finite Automata (DFA)

• Focuses on transitions between states

• Path through DFA

• Each path represents a string

• Nondeterministic Finite Automata (NFA)

• Get help from a fairy god mother.

• Regular Expressions

• Representing string patterns

• Extended Regular Expressions

• Also allow set intersection and set complement

Compilably Equivalent

(a*b  a*bb) 0*1*

(a*b  a*bb) 0*1*

Code with Simple Loop

Loop Invariant that must always be true when at top of loop.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

Gives a picture of what will be remembered about the prefix read so far.

Exit

Read the next character & maintain the Loop Invariant.

Code with Simple Loop

Establish the Loop Invariant for the empty string.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

Once the input string is all read in,determine if it is in L or not.

Code with Simple Loop

Loop Invariant that must always be true when at top of loop.

Gives a picture of what will be remembered about the prefix read so far.

Temporary variables need not be mentioned.

Exit

Read the next character & maintain the Loop Invariant.

Code with Simple Loop

Establish the Loop Invariant for the empty string.

Once the input string is all read in,determine if it is in L or not.

Many different equivalent models:

• Loop Invariant

• What is remembered at top of loop

• Code with simple loop

• Deterministic Finite Automata (DFA)

• Focuses on transitions between states

• Path through DFA

• Each path represents a string

• Nondeterministic Finite Automata (NFA)

• Get help from a fairy god mother.

• Regular Expressions

• Representing string patterns

• Extended Regular Expressions

• Also allow set intersection and set complement

Compilably Equivalent

(a*b  a*bb) 0*1*

(a*b  a*bb) 0*1*

Deterministic Finite Automaton (DFA)

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

Temp variables need not be considered.

Build a state node for each “state” that the loop invariant says the computation might be in.ie for every setting of the variables.

Give the states meaningful names.

For each state node & for each character, have an edge to the state node to transition to.

Deterministic Finite Automaton (DFA)

Establish the Loop Invariant for the empty stringby specifying the start state.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

δ(qcurrent , cnext read) = qnext

Exit

Deterministic Finite Automaton (DFA)

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

Accept states are those at whichthe DFA accepts the input if in this state when the string ends.

Exit

Deterministic Finite Automaton (DFA)

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

Many different equivalent models:

• Loop Invariant

• What is remembered at top of loop

• Code with simple loop

• Deterministic Finite Automata (DFA)

• Focuses on transitions between states

• Path through DFA

• Each path represents a string

• Nondeterministic Finite Automata (NFA)

• Get help from a fairy god mother.

• Regular Expressions

• Representing string patterns

• Extended Regular Expressions

• Also allow set intersection and set complement

Compilably Equivalent

(a*b  a*bb) 0*1*

(a*b  a*bb) 0*1*

Path Through DFA

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

a = 010

start

0

1

0

a  L

Path Through DFA

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

a = 01010

0

start

0

a  L

1

1

0

DFA “Knowing” and Distinguishable Strings.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

When in this state, we say that the DFA “knows” that for the prefix read, leng=2 and r=even.

DFA “Knowing” and Distinguishable Strings.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

If all prefixes ending up at this state have some common property and the DFA is in this state then we say that the DFA “knows” that this property.

DFA “Knowing” and Distinguishable Strings.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

The language L does not distinguish between prefixes a = 00 and β = 11,because for all future ζ, αζand βζhave the same answer.

Hence the DFA need not distinguish which of a and βwas read.

Hence these a and βcan go to the same state.

DFA “Knowing” and Distinguishable Strings.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

The language L does need to distinguish between prefixes a = 00 and β = 01,because future ζ=0, αζand βζhave the different answers.

Hence the DFA must distinguish which of a and βwas read.

Hence these a and βmust go to the different states.

DFA “Knowing” and Distinguishable Strings.

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

Partition all strings into sets based on whether they are distinguished by the language L.

These sets correspond directly to the needed states of the DFA.

all other strings

00,11

0

100,010,001,111

01,10

1

Many different equivalent models:

• Loop Invariant

• What is remembered at top of loop

• Code with simple loop

• Deterministic Finite Automata (DFA)

• Focuses on transitions between states

• Path through DFA

• Each path represents a string

• Nondeterministic Finite Automata (NFA)

• Get help from a fairy god mother.

• Regular Expressions

• Representing string patterns

• Extended Regular Expressions

• Also allow set intersection and set complement

Compilably Equivalent

(a*b  a*bb) 0*1*

(a*b  a*bb) 0*1*

NonDeterminism

``Nondeterministic'‘ means that what

the machine does next is not set in stone, i.e. not determined, but there is a choice.

NFA are a natural abstraction of life

NonDeterminism

δ(qcurrent , cnext read) = { qnext }

The transition function specifies is a set of states, one of which will be the next state.

NonDeterminism

δ(qcurrent , cnext read) = { qnext }

The transition function specifies is a set of states, one of which will be the next state.

0

0

Jeff talks of a Fairy God Mother helping to know which way to go

NonDeterminism

We say God is giving you a “fair” lifeiff there exists a reasonable path that you could follow from your start state to a final state that you would accept.

Path Through NFA

0

0

0

a = 0001010

start

1

0

1

0

We say a string a is accepted iff there is an path through the NFAlabeled by the stringending at an accept state.

Path Through NFA

0

0

1

0

0

0

1

0

a = 0001010

start

We say a string a is accepted iff there is an path through the NFAlabeled by the stringending at an accept state.

But there is also many rejecting paths labeled a.

That’s ok.

Path Through NFA

a = 0001010

The language L of strings accepted by this NFA are:

It is my job to tell you just before you start reading the 0101.

TM vs DFA

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

DFA can be built into hardware using and/or gates.

TM vs DFA

Model of Computation: Turing Machine

DFA

q

• The current configuration specified by

• The contents of tape.

• Unlike the TM, the “tape” in a DFA can never be changed and only read once.

TM vs DFA

Model of Computation: Turing Machine

DFA

q

• The current configuration specified by

• The contents of tape.

• The current state q

• One of some fixed number

• One of which is the start state

• And some of which are accept state.

TM vs DFA

Model of Computation: Turing Machine

DFA

q

• The current configuration specified by

• The contents of tape.

• The current state q

• The current location of the head.

• In a DFA the head moves right each step.

• But does not “know” its current location.

• It only knows the value of this current cell.

TM vs DFA

Model of Computation: Turing Machine

DFA

q

• The legal operations are:

• Given the current state q

• And value c of the cell with the head

• The DFA looks up in a preset table

• The next state q’

• ie Transition function δ(q,c) = <q’,c’,direction>

TM vs DFA

Model of Computation: Turing Machine

DFA

q

• The finite state control can be thought of being a JAVA object with:

• a finite length of code (& a current line)

• a finite number of variables

• each taking on a current value from a finite range

• Its periscope can look at one cell of the tape

TM vs DFA

Model of Computation: Turing Machine

DFA

q

• The finite state control can be thought of being a JAVA object with:

• Each time step it decides

• How to change its values of its internal variables and its current line of code

TM vs DFA

Model of Computation: Turing Machine

DFA

q

q

q’

• The finite state control can be thought of as having

• A fixed sized black board on which to writesome bounded amount of information q.

• Eg, If it remembers a total of r bits, then then the number of different states q is

2r

TM vs DFA

l=39,

x=3,

y=2,

p=even

Model of Computation: Turing Machine

DFA

q

q

q’

• The finite state control can be thought of as having

• A fixed sized black board on which to writesome bounded amount of information q.

• Eg, If it remembers

• x ϵ{1,2,3,4}, y ϵ {1,2,3,4,5}, & p ϵ {even,odd}

• The number of different states q is

• Give the states meaningful names like q<line=39,x=3,y=2,p=even>.

4×5×2 = 40

We will always assume we are at the top of the loop.

line=1

TM vs DFA

l=39,

x=3,

y=2,

p=even

Model of Computation: Turing Machine

DFA

q

q

q’

• The finite state control can be thought of as having

• A fixed sized black board on which to write q.

• And learning value c of the cell under new head position.

• Can instantly do any computation on what it knows.

• (even solving incomputable problems)

δ(qknows,c) = qcomputed

qknowsis the state indicating what the TM currently knows.

qcomputedis the state indicating what the TM computes.

TM vs DFA

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

δ(ql=1,r=even,1) = ql=2,r=odd

1

DFA: Formal definition

• A deterministic finite automaton (DFA)M is defined by a 5-tuple M=(Q,,,q0,F)

• Q: finite set of states

• : finite alphabet

• : transition function :QQ

• q0Q: start state

• FQ: set of accepting states

When in state q and reading the character c,transition to state q’ =(q,c)

NFA: Formal definition

• A nondeterministic finite automaton (NFA) M is defined by a 5-tuple M=(Q,,,q0,F)

• Q: finite set of states

• : finite alphabet

• : transition function :QP (Q)

• q0Q: start state

• FQ: set of accepting states

When in state q and reading the character c,transition to one of the states q’ in the set (q,c).

I can help choose 

NFA: Formal definition

• A nondeterministic finite automaton (NFA) M is defined by a 5-tuple M=(Q,,,q0,F)

• Q: finite set of states

• : finite alphabet

• : transition function :QP(Q)

• q0Q: start state

• FQ: set of accepting states

1

 = {}Edges labeled  can be followed anytime.

I can help choose 

Complexity Classes

Linear

DFA=NFA

These languages are calledRegular Languages.

Problems which have

TM/Java programs that solve themin linear time

a*b*

a*b*

0n1n

0n1n

=Extended Regular

First is easy.For second, the algorithm must count.

=Regular

Many different equivalent models:

• Loop Invariant

• What is remembered at top of loop

• Code with simple loop

• Deterministic Finite Automata (DFA)

• Focuses on transitions between states

• Path through DFA

• Each path represents a string

• Nondeterministic Finite Automata (NFA)

• Get help from a fairy god mother.

• Regular Expressions

• Representing string patterns

• Extended Regular Expressions

• Also allow set intersection and set complement

Compilably Equivalent

(a*b  a*bb) 0*1*

(a*b  a*bb) 0*1*

Regular Expression and Extended Reg

Regular Expression are a quick notation for defining a language of strings.

• Any finite set of finite strings is a regular expression

• Eg R = {0,01,11}, then L(R) = {0,01,11}.

• R1R2 , R1∙R2 , and R*

representing L1L2 , L1∙L2 , and L*

Extended Regular Expressions

• Also R1R2 and R

Regular Expression and Extended Reg

Union: L1L2 = { α | αϵL1 or αϵL2 }

Intersection: L1L2 = { α | αϵL1and αϵL2 }

Complement: L = { α| αL }

Concatenation:

• L1∙L2 = { αβ | αϵL1 and βϵL2}

• |L1∙L2| is likely |L1| |L2|

• {ab,cd}∙{wx,yz} = {abwx, abyz, cdwx, cdyz}

• {0,00}∙{0,00} = {00,000,0000}

Kleene Star

• L*= { α1α2α3…αr| r ≥0 and each αi}

• {ab,cd}*= { , ab, cd, abab, abcd, cdab, cdcd,ababab, ….

Regular Expression and Extended Reg

• Unix ‘grep’ command: Global Regular Expression and Print

• Lexical Analyzer Generators (part of compilers)

• Both use regular expression to DFA conversion

Extended Regular Expression

• {0,1}

• {0,1}*

• {0,1}3

• 0*

• 10*10*

= character 0 or character 1

= a string consisting of zero or more characters

each is character 0 or character 1

= a string consisting of at most 3 characters each is character 0 or character 1

= a string consisting of zero or more 0’s

= a string consisting of a 1 followed byzero or more 0’s,followed by a 1, followed by zero or more 0’s

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

= a 0/1 string with two onesstarting with a one.

• 0*10* (10*10*)*

• {0,1}3  0*10* (10*10*)*

= a 0/1 string with an odd number of 1’s

= L

Regular Expression

{0,1}3  0*10* (10*10*)*

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

How would you express this language without?

L = {1,10,01,100,010,001,111}

Regular Expression

• {0,1}3  0*10* (10*10*)*

= L

L = {a Î {0,1}* | a has length at mostthree and the number of 1'sis odd }.

Ok, but in general how would you build a regular expression from the intersection of two others?

No idea. But an earlier page says that there is a way to compile from one to the other.

The advantage of having many modelsfor the same thing is some things are easier in one and then the theory automatically coverts between them.

Parsing with Regular Expressions

0*10* (10*10*)*

L = {a Î {0,1}* | the number of 1'sis odd }.

00010011001000101010001010000

Yes, this regular expressions represents every string with the property and no string without it,but more…

Regular Expressions can also be used to “parse” a string in order understand the string’s structure better.

Parsing with Regular Expressions

0*10* (10*10*)*

*=3

*=2

L = {a Î {0,1}* | the number of 1'sis odd }.

00010011001000101010001010000

Partition the string just before the 2nd, 4th, 6th, … one.

The first block has the form 0*10* with the first * putting out 3 zeroes and the second putting out 2.

Parsing with Regular Expressions

0*10* (10*10*)*

*=4

*=0

*=2

L = {a Î {0,1}* | the number of 1'sis odd }.

00010011001000101010001010000

The remaining4 blocks are produced by having ()*be 4.

Each has the form 10*10*

The first such block has the form 10*10* with the first * putting out 0 zeroes and the second putting out 2.

Parsing with Regular Expressions

={0,1}*

0* (10*)*

00010011001000101010001010000

What language does this regular expression represent and how does is break up a string?

All strings are represented, hence these two regular expressions produce the same strings, but they parse them differently.

Parsing with Regular Expressions

Build a Parse Tree

0*10* (10*10*)*  {a,b}*

Choose one object from the set for union.

*

*

00010011001000101010001010000

*

b

*

*

a

*

*

*

*

*

*

*

*

0

1

*

*

*

*

0

1

1

0

0

000

1

00

1 1 00 1 000 1 0 1 0 1 000 1 0 1 0000

Parsing with Regular Expressions

Build a Parse Tree

0*10* (10*10*)*  {a,b}*

Choose one object from each side for concatenation.

*

*

00010011001000101010001010000

*

b

*

*

a

*

*

*

*

*

*

*

*

0

1

*

*

*

*

0

1

1

0

0

000

1

00

1 1 00 1 000 1 0 1 0 1 000 1 0 1 0000

Parsing with Regular Expressions

Build a Parse Tree

0*10* (10*10*)*  {a,b}*

Choose the number of repeats r ≥0 for star.

*

*

00010011001000101010001010000

*

b

*

*

a

*

*

*

*

*

*

*

*

0

1

*

*

*

*

0

1

1

0

0

000

1

00

1 1 00 1 000 1 0 1 0 1 000 1 0 1 0000

More Examples of DFAs

Fixed/Constant vs Arbitrary/Finite

I make a DFA M with k=1,000,000,000,000,000 states.

Wow. That’s not fair.With more states, it can count higher.

I will let him use number of states,but this fixed M must work for all inputs.

If he uses more states,I will give him a bigger input I.

M must still solve the problem.

•  DFA M,  an integer k,

• "inputs I,

• K(J,I) k

More Examples of DFAs

• DFA can’t count arbitrarily high. Hence a typical thing to do is to count mod some integer.

• Three ways of thinking about mod.

• mod(85,3) is a function that divides 85 by 3 giving 28 with a remainder of 1 and hence outputs 1. Note the answer is in {0,1,2}.

• We count 0,1,2,0,1,2,0,1,2,0,1,2, …

• In the mod 3 world, -5=-2=1=4=7

• Hence we could equivalently say

• (810)+5 = 85 = 1 (mod 3)

• (810)+5 = (21)+2 = 4 = 1 (mod 3)

• (810)+5 = (-11)+2 = 1 (mod 3)

• The final answer is typically in the range {0,1,2}, but it does not have to be.

More Examples of DFAs

L = {ω ∈ {0,1}∗ | 2 × (# of 0’s in ω) - (# of 1’s in ω) mod 5 = 0}

10011010011001001

Eg, ω = 2 × (# of 0’s in ω) - (# of 1’s in ω)

= 2 × 9 − 8 = 10 mod 5 = 0 . Hence ω ∈ L.

More Examples of DFAs

L = {ω ∈ {0,1}∗ | 2 × (# of 0’s in ω) - (# of 1’s in ω) mod 5 = 0}

10011010011001001

When it has read this much,what does it remember?

• r = 2 × (# of 0’s in ω) - (# of 1’s in ω) mod 5

If the next character is a 0,

increase r by 2.

If the next character is a 1,

decrease r by 1.

More Examples of DFAs

L = {ω ∈ {0,1}∗ | 2 × (# of 0’s in ω) - (# of 1’s in ω) mod 5 = 0}

More Examples of DFAs

L = {ω ∈ {0,1}∗ | 2 × (# of 0’s in ω) - (# of 1’s in ω) mod 5 = 0}

ω = 100111

ω L

1

1

Regular expression?

No idea?

But I know it is possible.

0

1

0

1

More Examples of DFAs

0

0

0

a = 0001010

start

1

0

1

0

a  L

We say a string a is accepted iff there is an path through the NFAlabeled by the stringending at an accept state.

But we want a Deterministic FA.

a = 001001010

0

1

0

start

0,1

0

1

0

0

1

1

0

a  L

More Examples of DFAs

Hint: Pooh can compute anything from what’s on his black board.

More Examples of DFAs

Output: f(I) if |I| ≤ 6

Input: I = b100101b

α=100

• Leap into the middle of the algorithm.

• If you have read the prefix α = 100, what do you want to remember on the black board?

• The entire prefix read so far is on the board.

• What state should you be in?

• q100.

More Examples of DFAs

Output: f(I) if |I| ≤ 6

Input: I = b100101b

• δ(q100, 1) = ?

• You are in state q100and the next character is a 1.

• Hence the input prefix you have read is 100.

• Hence after reading the 1, the prefix read will be 1001.

• Hence the next state should be q1001.

• δ(q100, 1) = q1001

• α∑<6, c∑δ(qα, c) =

qαc

For each string α from alphabet ∑ of length at most 6

there is a state qα.

For each character c in this alphabet, when reading this characterwhat should the next state (and actions) should be?

More Examples of DFAs

Output: f(I) if |I| ≤ 6

Input: I = b100101b

0

q000

α, cδ(qα, c) = qαc

q00

0

q001

1

qtoo long

α |α|=6, cδ(qα, c) =

q0

0

q010

1

0

q01

q011

1

q

0

q100

q10

1

q100101

0

q101

1

q1

0,1

0

q110

qtoo long

1

q11

q111

1

More Examples of DFAs

Output: f(I) if |I| ≤ 6

Input: I = b100101b

0

q000

• α |α|≤6,δ(qα, b) = ?

• Unlike with a TM, the input doesn’t end in a blank.

• Instead a DFA makes a state an accept state if when halting there it should accept.

• Again this is done with Table lookup

• α |α|≤6, f(α)L  qαaccept

q00

0

q001

1

q0

0

q010

1

0

q01

q011

1

q

0

q100

q10

1

q100101

0

q101

1

q1

0,1

0

q110

qtoo long

1

q11

q111

1

More Examples of DFAs

20569 mod 7 = 3

205694 mod 7

= 2056910+4 mod 7

= (20569mod 7) 10+4 mod 7

= (3) 10+4 mod 7

= 34 mod 7 = 6

**

**

**

**

**

**

**

**

**

**

**

+

**

**

**

**

**

**

**

**

**

* **

**

*

+

**

**

**

**

**

**

**

**

* **

* **

*

**

*

+

**

**

**

**

**

**

**

* **

* **

*

* **

*

**

*

+

**

**

**

**

**

**

* **

* **

*

* **

*

* **

*

**

*

+

*

*

* **

*

* **

*

* **

*

* **

*

* **

*

***

*

* **

*

* **

*

* **

*

* **

*

**

*

+

More Examples of DFAs

a:b means if you read an a then output a b and follow this edge to the next state

More Examples of DFAs

When in state “not in a comment”

• If you read a, b, or *, then copy it to the output.

• If you read /*, then give no output (e = empty string)and go state “in a comment”.

• If you read /a, /b, /*, or /“ then copy it (delayed) to the output.

More Examples of DFAs

When in state “in a comment”

• If you read a, b, *, “, or / then ignore it.

• If you read */, then give no output (e = empty string)and go state “not in a comment”.

• If you read *a, *b, /“, then ignore it.

More Examples of DFAs

When in state “not in a comment”

• If you read “, then copy it to the output and go state “in quote”.

• If you then read another ”, then leave this state.

More Examples of DFAs

When in state “in quote”

• Copy everything to the output.

More Examples of DFAs

Impossible

• If you do not print it as you go along, then remember what to print when the line ends.

• If you do print it as you go along, then you can’t unprint it when you read the */.