slide1
Download
Skip this Video
Download Presentation
CSC 3130: Automata theory and formal languages

Loading in 2 Seconds...

play fullscreen
1 / 22

CSC 3130: Automata theory and formal languages - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

Fall 2008. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Normal forms and parsing. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Testing membership and parsing. Given a grammar How can we know if a string x is in its language?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CSC 3130: Automata theory and formal languages' - rigg


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Fall 2008

The Chinese University of Hong Kong

CSC 3130: Automata theory and formal languages

Normal forms and parsing

Andrej Bogdanov

http://www.cse.cuhk.edu.hk/~andrejb/csc3130

testing membership and parsing
Testing membership and parsing
  • Given a grammar
  • How can we know if a string x is in its language?
  • If so, can we reconstruct a parse tree for x?

S → 0S1 | 1S0S1 | T

T → S | e

first attempt
First attempt
  • Maybe we can try all possible derivations:

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

S

0S1

00S11

01S0S11

0T1

when do we stop?

1S0S1

10S10S1

...

T

S

problems
Problems
  • How do we know when to stop?

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

S

0S1

00S11

01S0S11

when do we stop?

0T1

1S0S1

10S10S1

...

problems1
Problems
  • Idea: Stop derivation when length exceeds |x|
  • Not right because of -productions
  • We might want to eliminate -productions too

S → 0S1 | 1S0S1 | T

T → S | 

x = 01011

S  0S1  01S0S11  01S011  01011

1

3

7

6

5

problems2
Problems
  • Loops among the variables (S→T→S) might make us go forever
  • We might want to eliminate such loops

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

unit productions
Unit productions
  • A unit production is a production of the formwhere A1 and A2 are both variables
  • Example

A1 → A2

grammar:

unit productions:

S → 0S1 | 1S0S1 | T

T → S | R | 

R → 0SR

S

T

R

removal of unit productions
Removal of unit productions
  • If there is a cycle of unit productionsdelete it and replace everything with A1
  • Example

A1 → A2 → ... → Ak→ A1

S

T

S → 0S1 | 1S0S1 | T

T → S | R | 

R → 0SR

S → 0S1 | 1S0S1

S → R | 

R → 0SR

R

T is replaced by S in the {S, T} cycle

removal of unit productions1
Removal of unit productions
  • For other unit productions, replace every chainby productions A1 → ,... , Ak→ 
  • Example

A1 → A2 → ... → Ak→ 

S → 0S1 | 1S0S1

| R | 

R → 0SR

S → 0S1 | 1S0S1 | 0SR | 

R → 0SR

S → R → 0SR is replaced by S → 0SR, R → 0SR

removal of productions
Removal of -productions
  • A variable N is nullable if there is a derivation
  • How to remove -productions (except from S)

*

N

  • Find all nullable variables N1, ..., Nk
  • For i = 1 to k
  • For every production of the form A → Ni,
  • add another production A → 
  • If Ni →  is a production, remove it
  • If S is nullable, add the special productionS → 

example
Example
  • Find the nullable variables

grammar

nullable variables

B

C

D

S  ACD

A a

B  

C  ED | 

D  BC | b

E  b

  • Find all nullable variables N1, ..., Nk

finding nullable variables
Finding nullable variables
  • To find nullable variables, we work backwards
    • First, mark all variables A s.t. A   as nullable
    • Then, as long as there are productions of the formwhere all of A1,…, Ak are marked as nullable, mark A as nullable

A → A1… Ak

eliminating e productions
Eliminating e-productions

D  C

S  AD

D  B

D  e

S  AC

S  A

C  E

S  ACD

A a

B  

C  ED | 

D  BC | b

E  b

nullable variables:B, C, D

  • For i = 1 to k
  • For every production of the form A → Ni,
  • add another production A → 
  • If Ni →  is a production, remove it
recap
Recap
  • After eliminating e-productions and unit productions, we know that every derivationdoesn’t shrink in length and doesn’t go into cycles
  • Exception: S →
    • We will not use this rule at all, except to check if e  L
  • Note
    • e-productions must be eliminated before unit productions

*

S  a1…ak

where a1, …, ak are terminals

example testing membership

eliminate

unit, e-prod

Example: testing membership

S →  | 01 | 101 | 0S1

|10S1 | 1S01 | 1S0S1

S → 0S1 | 1S0S1 | T

T → S | 

x = 00111

01, 101

S

0S1

0011, 01011

00S11

strings of length ≥ 6

only strings of length ≥ 6

10011, strings of length ≥ 6

10S1

10101, strings of length ≥ 6

1S01

only strings of length ≥ 6

1S0S1

algorithm 1 for testing membership
Algorithm 1 for testing membership
  • We can now use the following algorithm to check if a string x is in the language of G
  • Eliminate all e-productions and unit productions
  • If x = e and S → , accept; else delete S → 
  • Let X := S
  • While some new production P can be applied to X
  • Apply P to X
  • If X = x, accept
  • If |X| > |x|, backtrack
  • If no more productions can be applied to X, reject

practical limitations of algorithm i
Practical limitations of Algorithm I
  • Previous algorithm can be very slow if x is long
  • There is a faster algorithm, but it requires that we do some more transformations on the grammar

G = CFG of the java programming language

x = code for a 200-line java program

algorithm might take about 10200 steps!

chomsky normal form
Chomsky Normal Form
  • A grammar is in Chomsky Normal Form if every production (except possibly S → e)is of the type
  • Conversion to Chomsky Normal Form is easy:

A → a

A → BC

or

A → BcDE

A → BX1

X1→ CX2

X2→ DE

A → BCDE

C → c

break up

sequences

with new

variables

replace

terminals

with new

variables

C → c

exercise
Exercise
  • Convert this CFG into Chomsky Normal Form:

S  |ADDA

A  a

C  c

D  bCb

algorithm 2 for testing membership
Algorithm 2 for testing membership

SAC

S  AB | BC

A  BA | a

B  CC | b

C  AB | a

SAC

B

B

SA

B

SC

SA

B

AC

AC

B

AC

x = baaba

b

a

a

b

a

Idea: We generate each substring of x bottom up

parse tree reconstruction

SAC

SAC

B

B

SA

B

SC

SA

B

AC

AC

B

AC

b

a

a

b

a

Parse tree reconstruction

S  AB | BC

A  BA | a

B  CC | b

C  AB | a

x = baaba

Tracing back the derivations, we obtain the parse tree

cocke younger kasami algorithm
Cocke-Younger-Kasami algorithm

Input: Grammar G in CNF, string x = x1…xk

table

cells

  • For i = 1 to k If there is a production A  xiPut A in table cell ii
  • For b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = sto t If there is a production A  BC where B is in cell sj and C is in cell jtPut A in cell st

1k

23

12

22

kk

11

x1 x2 … xk

s

j

t

k

1

b

Cell ij remembers all possible derivations of substring xi…xj

ad