- 68 Views
- Uploaded on
- Presentation posted in: General

CSC 3130: Automata theory and formal languages

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Fall 2008

The Chinese University of Hong Kong

CSC 3130: Automata theory and formal languages

Normal forms and parsing

Andrej Bogdanov

http://www.cse.cuhk.edu.hk/~andrejb/csc3130

- Given a grammar
- How can we know if a string x is in its language?
- If so, can we reconstruct a parse tree for x?

S → 0S1 | 1S0S1 | T

T → S | e

- Maybe we can try all possible derivations:

S → 0S1 | 1S0S1 | T

T → S |

x = 00111

S

0S1

00S11

01S0S11

0T1

when do we stop?

1S0S1

10S10S1

...

T

S

- How do we know when to stop?

S → 0S1 | 1S0S1 | T

T → S |

x = 00111

S

0S1

00S11

01S0S11

when do we stop?

0T1

1S0S1

10S10S1

...

- Idea: Stop derivation when length exceeds |x|
- Not right because of -productions
- We might want to eliminate -productions too

S → 0S1 | 1S0S1 | T

T → S |

x = 01011

S 0S1 01S0S11 01S011 01011

1

3

7

6

5

- Loops among the variables (S→T→S) might make us go forever
- We might want to eliminate such loops

S → 0S1 | 1S0S1 | T

T → S |

x = 00111

- A unit production is a production of the formwhere A1 and A2 are both variables
- Example

A1 → A2

grammar:

unit productions:

S → 0S1 | 1S0S1 | T

T → S | R |

R → 0SR

S

T

R

- If there is a cycle of unit productionsdelete it and replace everything with A1
- Example

A1 → A2 → ... → Ak→ A1

S

T

S → 0S1 | 1S0S1 | T

T → S | R |

R → 0SR

S → 0S1 | 1S0S1

S → R |

R → 0SR

R

T is replaced by S in the {S, T} cycle

- For other unit productions, replace every chainby productions A1 → ,... , Ak→
- Example

A1 → A2 → ... → Ak→

S → 0S1 | 1S0S1

| R |

R → 0SR

S → 0S1 | 1S0S1 | 0SR |

R → 0SR

S → R → 0SR is replaced by S → 0SR, R → 0SR

- A variable N is nullable if there is a derivation
- How to remove -productions (except from S)

*

N

- Find all nullable variables N1, ..., Nk
- For i = 1 to k
- For every production of the form A → Ni,
- add another production A →
- If Ni → is a production, remove it
- If S is nullable, add the special productionS →

- Find the nullable variables

grammar

nullable variables

B

C

D

S ACD

A a

B

C ED |

D BC | b

E b

- Find all nullable variables N1, ..., Nk

- To find nullable variables, we work backwards
- First, mark all variables A s.t. A as nullable
- Then, as long as there are productions of the formwhere all of A1,…, Ak are marked as nullable, mark A as nullable

A → A1… Ak

D C

S AD

D B

D e

S AC

S A

C E

S ACD

A a

B

C ED |

D BC | b

E b

nullable variables:B, C, D

- For i = 1 to k
- For every production of the form A → Ni,
- add another production A →
- If Ni → is a production, remove it

- After eliminating e-productions and unit productions, we know that every derivationdoesn’t shrink in length and doesn’t go into cycles
- Exception: S →
- We will not use this rule at all, except to check if e L

- Note
- e-productions must be eliminated before unit productions

*

S a1…ak

where a1, …, ak are terminals

eliminate

unit, e-prod

S → | 01 | 101 | 0S1

|10S1 | 1S01 | 1S0S1

S → 0S1 | 1S0S1 | T

T → S |

x = 00111

01, 101

S

0S1

0011, 01011

00S11

strings of length ≥ 6

only strings of length ≥ 6

10011, strings of length ≥ 6

10S1

10101, strings of length ≥ 6

1S01

only strings of length ≥ 6

1S0S1

- We can now use the following algorithm to check if a string x is in the language of G

- Eliminate all e-productions and unit productions
- If x = e and S → , accept; else delete S →
- Let X := S
- While some new production P can be applied to X
- Apply P to X
- If X = x, accept
- If |X| > |x|, backtrack
- If no more productions can be applied to X, reject

- Previous algorithm can be very slow if x is long
- There is a faster algorithm, but it requires that we do some more transformations on the grammar

G = CFG of the java programming language

x = code for a 200-line java program

algorithm might take about 10200 steps!

- A grammar is in Chomsky Normal Form if every production (except possibly S → e)is of the type
- Conversion to Chomsky Normal Form is easy:

A → a

A → BC

or

A → BcDE

A → BX1

X1→ CX2

X2→ DE

A → BCDE

C → c

break up

sequences

with new

variables

replace

terminals

with new

variables

C → c

- Convert this CFG into Chomsky Normal Form:

S |ADDA

A a

C c

D bCb

SAC

S AB | BC

A BA | a

B CC | b

C AB | a

–

SAC

–

B

B

SA

B

SC

SA

B

AC

AC

B

AC

x = baaba

b

a

a

b

a

Idea: We generate each substring of x bottom up

SAC

–

SAC

–

B

B

SA

B

SC

SA

B

AC

AC

B

AC

b

a

a

b

a

S AB | BC

A BA | a

B CC | b

C AB | a

x = baaba

Tracing back the derivations, we obtain the parse tree

Input: Grammar G in CNF, string x = x1…xk

table

cells

- For i = 1 to k If there is a production A xiPut A in table cell ii
- For b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = sto t If there is a production A BC where B is in cell sj and C is in cell jtPut A in cell st

1k

…

…

23

12

22

kk

11

x1 x2 … xk

s

j

t

k

1

b

Cell ij remembers all possible derivations of substring xi…xj