Fall 2008
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

CSC 3130: Automata theory and formal languages PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Fall 2008. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Normal forms and parsing. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Testing membership and parsing. Given a grammar How can we know if a string x is in its language?

Download Presentation

CSC 3130: Automata theory and formal languages

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Csc 3130 automata theory and formal languages

Fall 2008

The Chinese University of Hong Kong

CSC 3130: Automata theory and formal languages

Normal forms and parsing

Andrej Bogdanov

http://www.cse.cuhk.edu.hk/~andrejb/csc3130


Testing membership and parsing

Testing membership and parsing

  • Given a grammar

  • How can we know if a string x is in its language?

  • If so, can we reconstruct a parse tree for x?

S → 0S1 | 1S0S1 | T

T → S | e


First attempt

First attempt

  • Maybe we can try all possible derivations:

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

S

0S1

00S11

01S0S11

0T1

when do we stop?

1S0S1

10S10S1

...

T

S


Problems

Problems

  • How do we know when to stop?

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

S

0S1

00S11

01S0S11

when do we stop?

0T1

1S0S1

10S10S1

...


Problems1

Problems

  • Idea: Stop derivation when length exceeds |x|

  • Not right because of -productions

  • We might want to eliminate -productions too

S → 0S1 | 1S0S1 | T

T → S | 

x = 01011

S  0S1  01S0S11  01S011  01011

1

3

7

6

5


Problems2

Problems

  • Loops among the variables (S→T→S) might make us go forever

  • We might want to eliminate such loops

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111


Unit productions

Unit productions

  • A unit production is a production of the formwhere A1 and A2 are both variables

  • Example

A1 → A2

grammar:

unit productions:

S → 0S1 | 1S0S1 | T

T → S | R | 

R → 0SR

S

T

R


Removal of unit productions

Removal of unit productions

  • If there is a cycle of unit productionsdelete it and replace everything with A1

  • Example

A1 → A2 → ... → Ak→ A1

S

T

S → 0S1 | 1S0S1 | T

T → S | R | 

R → 0SR

S → 0S1 | 1S0S1

S → R | 

R → 0SR

R

T is replaced by S in the {S, T} cycle


Removal of unit productions1

Removal of unit productions

  • For other unit productions, replace every chainby productions A1 → ,... , Ak→ 

  • Example

A1 → A2 → ... → Ak→ 

S → 0S1 | 1S0S1

| R | 

R → 0SR

S → 0S1 | 1S0S1 | 0SR | 

R → 0SR

S → R → 0SR is replaced by S → 0SR, R → 0SR


Removal of productions

Removal of -productions

  • A variable N is nullable if there is a derivation

  • How to remove -productions (except from S)

*

N

  • Find all nullable variables N1, ..., Nk

  • For i = 1 to k

  • For every production of the form A → Ni,

  • add another production A → 

  • If Ni →  is a production, remove it

  • If S is nullable, add the special productionS → 


Example

Example

  • Find the nullable variables

grammar

nullable variables

B

C

D

S  ACD

A a

B  

C  ED | 

D  BC | b

E  b

  • Find all nullable variables N1, ..., Nk


Finding nullable variables

Finding nullable variables

  • To find nullable variables, we work backwards

    • First, mark all variables A s.t. A   as nullable

    • Then, as long as there are productions of the formwhere all of A1,…, Ak are marked as nullable, mark A as nullable

A → A1… Ak


Eliminating e productions

Eliminating e-productions

D  C

S  AD

D  B

D  e

S  AC

S  A

C  E

S  ACD

A a

B  

C  ED | 

D  BC | b

E  b

nullable variables:B, C, D

  • For i = 1 to k

  • For every production of the form A → Ni,

  • add another production A → 

  • If Ni →  is a production, remove it


Recap

Recap

  • After eliminating e-productions and unit productions, we know that every derivationdoesn’t shrink in length and doesn’t go into cycles

  • Exception: S →

    • We will not use this rule at all, except to check if e  L

  • Note

    • e-productions must be eliminated before unit productions

*

S  a1…ak

where a1, …, ak are terminals


Example testing membership

eliminate

unit, e-prod

Example: testing membership

S →  | 01 | 101 | 0S1

|10S1 | 1S01 | 1S0S1

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

01, 101

S

0S1

0011, 01011

00S11

strings of length ≥ 6

only strings of length ≥ 6

10011, strings of length ≥ 6

10S1

10101, strings of length ≥ 6

1S01

only strings of length ≥ 6

1S0S1


Algorithm 1 for testing membership

Algorithm 1 for testing membership

  • We can now use the following algorithm to check if a string x is in the language of G

  • Eliminate all e-productions and unit productions

  • If x = e and S → , accept; else delete S → 

  • Let X := S

  • While some new production P can be applied to X

  • Apply P to X

  • If X = x, accept

  • If |X| > |x|, backtrack

  • If no more productions can be applied to X, reject


Practical limitations of algorithm i

Practical limitations of Algorithm I

  • Previous algorithm can be very slow if x is long

  • There is a faster algorithm, but it requires that we do some more transformations on the grammar

G = CFG of the java programming language

x = code for a 200-line java program

algorithm might take about 10200 steps!


Chomsky normal form

Chomsky Normal Form

  • A grammar is in Chomsky Normal Form if every production (except possibly S → e)is of the type

  • Conversion to Chomsky Normal Form is easy:

A → a

A → BC

or

A → BcDE

A → BX1

X1→ CX2

X2→ DE

A → BCDE

C → c

break up

sequences

with new

variables

replace

terminals

with new

variables

C → c


Exercise

Exercise

  • Convert this CFG into Chomsky Normal Form:

S  |ADDA

A  a

C  c

D  bCb


Algorithm 2 for testing membership

Algorithm 2 for testing membership

SAC

S  AB | BC

A  BA | a

B  CC | b

C  AB | a

SAC

B

B

SA

B

SC

SA

B

AC

AC

B

AC

x = baaba

b

a

a

b

a

Idea: We generate each substring of x bottom up


Parse tree reconstruction

SAC

SAC

B

B

SA

B

SC

SA

B

AC

AC

B

AC

b

a

a

b

a

Parse tree reconstruction

S  AB | BC

A  BA | a

B  CC | b

C  AB | a

x = baaba

Tracing back the derivations, we obtain the parse tree


Cocke younger kasami algorithm

Cocke-Younger-Kasami algorithm

Input: Grammar G in CNF, string x = x1…xk

table

cells

  • For i = 1 to k If there is a production A  xiPut A in table cell ii

  • For b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = sto t If there is a production A  BC where B is in cell sj and C is in cell jtPut A in cell st

1k

23

12

22

kk

11

x1 x2 … xk

s

j

t

k

1

b

Cell ij remembers all possible derivations of substring xi…xj


  • Login