# CSC 3130: Automata theory and formal languages - PowerPoint PPT Presentation

1 / 22

Fall 2008. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Normal forms and parsing. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Testing membership and parsing. Given a grammar How can we know if a string x is in its language?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

CSC 3130: Automata theory and formal languages

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

#### Presentation Transcript

Fall 2008

The Chinese University of Hong Kong

CSC 3130: Automata theory and formal languages

Normal forms and parsing

Andrej Bogdanov

http://www.cse.cuhk.edu.hk/~andrejb/csc3130

### Testing membership and parsing

• Given a grammar

• How can we know if a string x is in its language?

• If so, can we reconstruct a parse tree for x?

S → 0S1 | 1S0S1 | T

T → S | e

### First attempt

• Maybe we can try all possible derivations:

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

S

0S1

00S11

01S0S11

0T1

when do we stop?

1S0S1

10S10S1

...

T

S

### Problems

• How do we know when to stop?

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

S

0S1

00S11

01S0S11

when do we stop?

0T1

1S0S1

10S10S1

...

### Problems

• Idea: Stop derivation when length exceeds |x|

• Not right because of -productions

• We might want to eliminate -productions too

S → 0S1 | 1S0S1 | T

T → S | 

x = 01011

S  0S1  01S0S11  01S011  01011

1

3

7

6

5

### Problems

• Loops among the variables (S→T→S) might make us go forever

• We might want to eliminate such loops

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

### Unit productions

• A unit production is a production of the formwhere A1 and A2 are both variables

• Example

A1 → A2

grammar:

unit productions:

S → 0S1 | 1S0S1 | T

T → S | R | 

R → 0SR

S

T

R

### Removal of unit productions

• If there is a cycle of unit productionsdelete it and replace everything with A1

• Example

A1 → A2 → ... → Ak→ A1

S

T

S → 0S1 | 1S0S1 | T

T → S | R | 

R → 0SR

S → 0S1 | 1S0S1

S → R | 

R → 0SR

R

T is replaced by S in the {S, T} cycle

### Removal of unit productions

• For other unit productions, replace every chainby productions A1 → ,... , Ak→ 

• Example

A1 → A2 → ... → Ak→ 

S → 0S1 | 1S0S1

| R | 

R → 0SR

S → 0S1 | 1S0S1 | 0SR | 

R → 0SR

S → R → 0SR is replaced by S → 0SR, R → 0SR

### Removal of -productions

• A variable N is nullable if there is a derivation

• How to remove -productions (except from S)

*

N

• Find all nullable variables N1, ..., Nk

• For i = 1 to k

• For every production of the form A → Ni,

• add another production A → 

• If Ni →  is a production, remove it

• If S is nullable, add the special productionS → 

### Example

• Find the nullable variables

grammar

nullable variables

B

C

D

S  ACD

A a

B  

C  ED | 

D  BC | b

E  b

• Find all nullable variables N1, ..., Nk

### Finding nullable variables

• To find nullable variables, we work backwards

• First, mark all variables A s.t. A   as nullable

• Then, as long as there are productions of the formwhere all of A1,…, Ak are marked as nullable, mark A as nullable

A → A1… Ak

### Eliminating e-productions

D  C

D  B

D  e

S  AC

S  A

C  E

S  ACD

A a

B  

C  ED | 

D  BC | b

E  b

nullable variables:B, C, D

• For i = 1 to k

• For every production of the form A → Ni,

• add another production A → 

• If Ni →  is a production, remove it

### Recap

• After eliminating e-productions and unit productions, we know that every derivationdoesn’t shrink in length and doesn’t go into cycles

• Exception: S →

• We will not use this rule at all, except to check if e  L

• Note

• e-productions must be eliminated before unit productions

*

S  a1…ak

where a1, …, ak are terminals

eliminate

unit, e-prod

### Example: testing membership

S →  | 01 | 101 | 0S1

|10S1 | 1S01 | 1S0S1

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

01, 101

S

0S1

0011, 01011

00S11

strings of length ≥ 6

only strings of length ≥ 6

10011, strings of length ≥ 6

10S1

10101, strings of length ≥ 6

1S01

only strings of length ≥ 6

1S0S1

### Algorithm 1 for testing membership

• We can now use the following algorithm to check if a string x is in the language of G

• Eliminate all e-productions and unit productions

• If x = e and S → , accept; else delete S → 

• Let X := S

• While some new production P can be applied to X

• Apply P to X

• If X = x, accept

• If |X| > |x|, backtrack

• If no more productions can be applied to X, reject

### Practical limitations of Algorithm I

• Previous algorithm can be very slow if x is long

• There is a faster algorithm, but it requires that we do some more transformations on the grammar

G = CFG of the java programming language

x = code for a 200-line java program

algorithm might take about 10200 steps!

### Chomsky Normal Form

• A grammar is in Chomsky Normal Form if every production (except possibly S → e)is of the type

• Conversion to Chomsky Normal Form is easy:

A → a

A → BC

or

A → BcDE

A → BX1

X1→ CX2

X2→ DE

A → BCDE

C → c

break up

sequences

with new

variables

replace

terminals

with new

variables

C → c

### Exercise

• Convert this CFG into Chomsky Normal Form:

A  a

C  c

D  bCb

### Algorithm 2 for testing membership

SAC

S  AB | BC

A  BA | a

B  CC | b

C  AB | a

SAC

B

B

SA

B

SC

SA

B

AC

AC

B

AC

x = baaba

b

a

a

b

a

Idea: We generate each substring of x bottom up

SAC

SAC

B

B

SA

B

SC

SA

B

AC

AC

B

AC

b

a

a

b

a

### Parse tree reconstruction

S  AB | BC

A  BA | a

B  CC | b

C  AB | a

x = baaba

Tracing back the derivations, we obtain the parse tree

### Cocke-Younger-Kasami algorithm

Input: Grammar G in CNF, string x = x1…xk

table

cells

• For i = 1 to k If there is a production A  xiPut A in table cell ii

• For b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = sto t If there is a production A  BC where B is in cell sj and C is in cell jtPut A in cell st

1k

23

12

22

kk

11

x1 x2 … xk

s

j

t

k

1

b

Cell ij remembers all possible derivations of substring xi…xj