1 / 73

# - PowerPoint PPT Presentation

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review top-down parsing Recursive descent LL(1) parsing Start bottom-up parsing LR(k) SLR(k) LALR(k). Top-down parsing.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '' - kovit

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Mayer Goldberg and Roman Manevich

Ben-Gurion University

Today
• Review top-down parsing
• Recursive descent
• LL(1) parsing
• Start bottom-up parsing
• LR(k)
• SLR(k)
• LALR(k)
Top-down parsing
• Parser repeatedly faces the following problem
• Given program P starting with terminal t
• Non-terminal N with list of possible production rules: N α1 … N αk
• Predict which rule can should be used to derive P
Recursive descent parsing
• Define a function for every nonterminal
• Every function work as follows
• Find applicable production rule
• Terminal function checks match with next input token
• Nonterminal function calls (recursively) other functions
• If there are several applicable productions for a nonterminal, use lookahead
Boolean expressions example

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

not ( not true or false )

production to apply known from next token

E

E

E =>

notE =>

not ( E OP E ) =>

not ( not E OP E ) =>

not ( not LIT OP E ) =>

not ( not true OP E ) =>

not ( not true or E ) =>

not ( not true or LIT ) =>

not ( not true or false )

not

(

E

OP

E

)

not

LIT

or

LIT

true

false

Flavors of top-down parsers
• Manually constructed
• Recursive descent (previous lecture, review now)
• Generated (this lecture)
• Based on pushdown automata
• Does not use recursion
Recursive descent parsing
• Define a function for every nonterminal
• Every function work as follows
• Find applicable production rule
• Terminal function checks match with next input token
• Nonterminal function calls (recursively) other functions
• If there are several applicable productions for a nonterminal, use lookahead
Matching tokens

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

match(token t) {

if (current == t)

current = next_token()

else

error

}

Variable current holds the current input token

Functions for nonterminals

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

E() {

if (current  {TRUE, FALSE}) // E LIT

LIT();

else if (current == LPAREN) // E ( E OP E )

match(LPAREN); E(); OP(); E(); match(RPAREN);

else if (current == NOT) // E not E

match(NOT); E();

else

error;

}

LIT() {

if (current == TRUE) match(TRUE);

else if (current == FALSE) match(FALSE);

else error;

}

Implementation via recursion

E() {

if (current  {TRUE, FALSE}) LIT();

else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN);

else if (current == NOT) match(NOT); E();

else error;

}

E → LIT

| ( E OP E )

| not E

LIT → true

| false

OP → and

| or

| xor

LIT() {

if (current == TRUE) match(TRUE);

else if (current == FALSE) match(FALSE);

else error;

}

OP() {

if (current == AND) match(AND);

else if (current == OR) match(OR);

else if (current == XOR) match(XOR);

else error;

}

How is prediction done?

p. 189

• For simplicity, let’s assume no null production rules
• See book for general case
• Find out the token that can appear first in a rule – FIRST sets
FIRST sets
• For every production rule Aα
• Every token that can appear as first in α under some derivation for α
• In our Boolean expressions example
• FIRST( LIT ) = { true, false }
• FIRST( ( E OP E ) ) = { ‘(‘ }
• FIRST( not E ) = { not }
• No intersection between FIRST sets => can always pick a single rule
• If the FIRST sets intersect, may need longer lookahead
• LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens
• LL(1) is an important and useful class
Computing FIRST sets

Assume no null productions A 

Initially, for all nonterminalsA, setFIRST(A) = { t | Atω for some ω }

Repeat the following until no changes occur:for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) ∪ FIRST(B)

This is known as fixed-point computation

FIRST sets computation example

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

1. Initialization

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

2. Iterate 1

STMT if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

2. Iterate 2

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPRTERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

2. Iterate 3 – fixed-point

STMT if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

LL(k) grammars
• A grammar is in the class LL(K) when it can be derived via:
• Top-down derivation
• Scanning the input from left to right (L)
• Producing the leftmost derivation (L)
• With lookahead of k tokens (k)
• For every two productions Aα and Aβ we have FIRST(α) ∩ FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions)
• A language is said to be LL(k) when it has an LL(k) grammar
• What can we do if grammar is not LL(k)?
LL(k) parsing via pushdown automata
• Pushdown automaton uses
• Prediction stack
• Input stream
• Transition table
• nonterminals x tokens -> production alternative
• Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t
LL(k) parsing via pushdown automata
• Two possible moves
• Prediction
• When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error
• Match
• When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t
• If (t ≠ T) syntax error
• Parsing terminates when prediction stack is empty
• If input is empty at that point, success. Otherwise, syntax error
Model of non-recursivepredictive parser

Predictive Parsing program

Stack

Output

Parsing Table

Example transition table

(1) E → LIT

(2) E → ( E OP E )

(3) E → not E

(4) LIT → true

(5) LIT → false

(6) OP → and

(7) OP → or

(8) OP → xor

Which rule should be used

Input tokens

Nonterminals

Running parser example

aacbb\$

A  aAb | c

Illegal input example

abcbb\$

A  aAb | c

Using top-down parsing approach
• Compute parsing table
• If table is conflict-free then we have an LL(k) parser
• If table contains conflicts investigate
• If grammar is ambiguous try to disambiguate
• Try using left-factoring/substitution/left-recursion elimination to remove conflicts
Marking “end-of-file”

Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’  S \$where \$ is not part of the set of tokens

To parse an input P with G’ we change it into P\$

Simplifies top-down parsing with null productions and LR parsing

Bottom-up parsing
• No problem with left recursion
• Widely used in practice
• Shift-reduce parsing: LR(k), SLR, LALR
• All follow the same pushdown-based algorithm
• Read input left-to-right producing rightmost derivation
• Differ on type of “LR Items”
• Parser generator CUP implements LALR
Some terminology
• The opposite of derivation is called reduction
• Let Aα be a production rule
• Let βAµ be a sentence
• Replace left-hand side of rule in sentence:βAµ=> βαµ
• A handle is a substring that is reduced during a series of steps in a rightmost derivation
Rightmost derivation example

1 + (2) + (3)

E  E + (E)

E i

E + (2) + (3)

Each non-leaf node represents a handle

E + (E) + (3)

Rightmost derivation in reverse

E + (3)

E

E + (E)

E

E

E

E

E

1

+

(

2

)

+

3

(

)

LR item

To be matched

Input

N  αβ

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

LR items

N  αβ

Shift Item

N  αβ

Reduce Item

Example

Z  exprEOF

expr  term | expr+ term

term  ID | (expr)

Z  E \$

E  T | E + T

T  i | ( E )

(just shorthand of the grammar on the top)

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

i

+

i

\$

Z  E \$

Why do we need these additional LR items?

Where do they come from?

What do they mean?

E  T

E  E + T

T  i

T  (E )

-closure

{ Z  E \$,

Z  E \$

E  T | E + T

T  i | ( E )

E  T,

-closure({Z  E \$}) =

E  E + T,

T  i ,

T  ( E ) }

Given a set S of LR(0) items

If P  αNβ is in S

then for each rule N  in the grammarS must also contain N 

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

i

+

i

\$

Remember position from which we’re trying to reduce

Items denote possible future handles

Z  E \$

E  T

E  E + T

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

i

+

i

\$

Match items with current token

Z  E \$

T  i

Reduce item!

E  T

E  E + T

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

T

+

i

\$

i

Z  E \$

Reduce item!

E  T

E  T

E  E + T

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

E

+

i

\$

T

i

Z  E \$

Reduce item!

E  T

E  T

E  E + T

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

E

+

i

\$

T

i

Z  E \$

Z  E\$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

E

+

i

\$

T

i

Z  E \$

Z  E\$

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

E

+

T

\$

i

T

i

Z  E \$

Z  E\$

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

E

+

T

\$

T

i

i

Reduce item!

Z  E \$

Z  E\$

E  E+T

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

E

\$

E

+

T

T

i

i

Z  E \$

Z  E\$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

E

\$

E

+

T

T

i

Reduce item!

i

Z  E \$

Z  E\$

Z  E\$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

Z

E

\$

E

+

T

Reduce item!

i

T

Z  E \$

i

Z  E\$

Z  E\$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

Computing item sets
• Initial set
• Z is in the start symbol
• -closure({ Zα | Zαis in the grammar } )
• Next set from a set S and the next symbol X
• step(S,X) = { NαXβ | NαXβ in the item set S}
• nextSet(S,X) = -closure(step(S,X))
LR(0) automaton example

reduce state

shift state

q6

E  T

T

T

q7

q0

T  (E)

E  T

E  E + T

T  i

T  (E)

Z  E\$

E  T

E  E + T

T  i

T  (E)

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

Z  E\$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

+

+

\$

)

q9

q2

Z  E\$

T  (E) 

T

q4

E  E + T

GOTO/ACTION tables

empty – error move

ACTION

Table

GOTO Table

LR(0) parser tables
• Two types of rows:
• Shift row – tells which state to GOTO for current token
• Reduce row – tells which rule to reduce (independent of current token)
• GOTO entries are blank
LR parser data structures

+

i

\$

input

stack

q0

i

q5

• Input – remainder of text to be processed
• Stack – sequence of pairs N, qi
• N – symbol (terminal or non-terminal)
• qi – state at which decisions are made
• Initial stack contains q0
LR(0) pushdown automaton

shift

i

+

i

\$

+

i

\$

input

input

stack

stack

q0

q0

i

q5

• Two moves: shift and reduce
• Shift move
• Remove first token from input
• Push it on the stack
• Compute next state based on GOTO table
• Push new state on the stack
LR(0) pushdown automaton

ReduceT  i

+

i

\$

+

i

\$

input

input

stack

stack

q0

i

q5

q0

T

q6

• Reduce move
• Using a rule N α
• Symbols in α and their following states are removed from stack
• New state computed based on GOTO table (using top of stack, before pushing N)
• N is pushed on the stack
• New state pushed on top of N
GOTO/ACTION table

Z  E \$

E  T

E  E + T

T  i

T ( E )

Warning: numbers mean different things!

rn = reduce using rule number n

sm = shift to state m

GOTO/ACTION table

top is on the right

Z  E \$

E  T

E  E + T

T  i

T ( E )

Are we done?

Can make a transition diagram for any grammar

Can make a GOTO table for every grammar

Cannot make a deterministic ACTION table for every grammar

LR(0) conflicts

T

q0

Z  E\$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

T  i[E]

E

Shift/reduce conflict

Z  E \$

E  T

E  E + T

T  i

T  ( E )

T  i[E]

LR(0) conflicts

T

q0

Z  E\$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

V  i

E

reduce/reduce conflict

Z  E \$

E  T

E  E + T

T  i

V  iT  ( E )

LR(0) conflicts
• Any grammar with an -rule cannot be LR(0)
• Inherent shift/reduce conflict
• A – reduce item
• PαAβ – shift item
• A can always be predicted from P αAβ
Back to the GOTO/ACTIONS tables

ACTION

Table

GOTO Table

ACTION table determined only by transition diagram, ignores input

SRL grammars

A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N

A reduce item N  α is applicable only when the look-ahead is in FOLLOW(N)

Differs from LR(0) only on the ACTION table

SLR ACTION table

Remember:

In contrast, GOTO table is indexed by state and a grammar symbol from the stack

Z  E \$

E  T

E  E + T

T  i

T ( E )

SLR ACTION table

vs.

SLR – use 1 token look-ahead

… as before…

T  i

T  i[E]

Are we done?

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q3

R

S → R 

q0

S’ → S

S → L = R

S → R

L → * R

L →  id

R → L

S

q1

S’ → S

q9

S → L = R 

q2

L

S → L = R

R → L 

=

R

q6

S → L = R

R →  L

L → * R

L → id

q5

id

*

L → id 

id

*

id

q4

L → * R

R → L

L → * R

L → id

*

L

q8

R → L 

L

q7

L → * R 

R

shift/reduce conflict

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

• S → L  = R vs. R → L 
• FOLLOW(R) contains =
• S ⇒ L = R ⇒ * R = R
• SLR cannot resolve the conflict either
LR(1) grammars

In SLR: a reduce item N  α is applicable only when the look-ahead is in FOLLOW(N)

But FOLLOW(N) merges look-ahead for all alternatives for N

LR(1) keeps look-ahead with each LR item

Idea: a more refined notion of follows computed per item

LR(1) item

[L → ● id, *]

[L → ● id, =]

[L → ● id, id]

[L → ● id, \$]

[L → id ●, *]

[L → id ●, =]

[L → id ●, id]

[L → id ●, \$]

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

• LR(1) item is a pair
• LR(0) item
• Meaning
• We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token.
• Example
• The production L  id yields the following LR(1) items
-closure for LR(1)
• For every [A → α ● Bβ , c] in S
• for every production B→δ and every token b in the grammar such that b FIRST(βc)
• Add [B → ● δ , b] to S

q3

q0

(S → R ∙ , \$)

(S’ → ∙ S , \$)

(S → ∙ L = R , \$)

(S → ∙ R , \$)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , \$ )

(L → ∙ id , \$ )

(L → ∙ * R , \$ )

q11

q9

(S → L = R ∙ , \$)

(L → id ∙ , \$)

R

q1

S

(S’ → S ∙ , \$)

id

R

id

L

q2

q12

(S → L ∙ = R , \$)

(R → L ∙ , \$)

(R → L ∙ , \$)

=

id

*

*

q10

(S → L = ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

L

q6

q4

q5

id

(L → id ∙ , \$)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , \$)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , \$)

(L → * R ∙ , \$)

Back to the conflict

q2

(S → L ∙ = R , \$)

(R → L ∙ , \$)

=

(S → L = ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

q6

Is there a conflict now?

LALR

LR tables have large number of entries

Often don’t need such refined observation (and cost)

LALR idea: find states with the same LR(0) component and merge their look-ahead component as long as there are no conflicts

LALR not as powerful as LR(1)