This presentation is the property of its rightful owner.
1 / 73

# Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2 PowerPoint PPT Presentation

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review top-down parsing Recursive descent LL(1) parsing Start bottom-up parsing LR(k) SLR(k) LALR(k). Top-down parsing.

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

#### Presentation Transcript

Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 2

Mayer Goldberg and Roman Manevich

Ben-Gurion University

### Today

• Review top-down parsing

• Recursive descent

• LL(1) parsing

• Start bottom-up parsing

• LR(k)

• SLR(k)

• LALR(k)

### Top-down parsing

• Parser repeatedly faces the following problem

• Given program P starting with terminal t

• Non-terminal N with list of possible production rules: N α1 … N αk

• Predict which rule can should be used to derive P

### Recursive descent parsing

• Define a function for every nonterminal

• Every function work as follows

• Find applicable production rule

• Terminal function checks match with next input token

• Nonterminal function calls (recursively) other functions

• If there are several applicable productions for a nonterminal, use lookahead

### Boolean expressions example

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

not ( not true or false )

production to apply known from next token

E

E

E =>

notE =>

not ( E OP E ) =>

not ( not E OP E ) =>

not ( not LIT OP E ) =>

not ( not true OP E ) =>

not ( not true or E ) =>

not ( not true or LIT ) =>

not ( not true or false )

not

(

E

OP

E

)

not

LIT

or

LIT

true

false

### Flavors of top-down parsers

• Manually constructed

• Recursive descent (previous lecture, review now)

• Generated (this lecture)

• Based on pushdown automata

• Does not use recursion

### Recursive descent parsing

• Define a function for every nonterminal

• Every function work as follows

• Find applicable production rule

• Terminal function checks match with next input token

• Nonterminal function calls (recursively) other functions

• If there are several applicable productions for a nonterminal, use lookahead

### Matching tokens

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

match(token t) {

if (current == t)

current = next_token()

else

error

}

Variable current holds the current input token

### Functions for nonterminals

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

E() {

if (current  {TRUE, FALSE}) // E LIT

LIT();

else if (current == LPAREN)// E ( E OP E )

match(LPAREN); E(); OP(); E(); match(RPAREN);

else if (current == NOT)// E not E

match(NOT); E();

else

error;

}

LIT() {

if (current == TRUE) match(TRUE);

else if (current == FALSE) match(FALSE);

else error;

}

### Implementation via recursion

E() {

if (current  {TRUE, FALSE})LIT();

else if (current == LPAREN)match(LPARENT);E(); OP(); E();match(RPAREN);

else if (current == NOT)match(NOT); E();

else error;

}

E → LIT

| ( E OP E )

| not E

LIT → true

| false

OP → and

| or

| xor

LIT() {

if (current == TRUE)match(TRUE);

else if (current == FALSE)match(FALSE);

elseerror;

}

OP() {

if (current == AND)match(AND);

else if (current == OR)match(OR);

else if (current == XOR)match(XOR);

elseerror;

}

### How is prediction done?

p. 189

• For simplicity, let’s assume no null production rules

• See book for general case

• Find out the token that can appear first in a rule – FIRST sets

### FIRST sets

• For every production rule Aα

• Every token that can appear as first in α under some derivation for α

• In our Boolean expressions example

• FIRST( LIT ) = { true, false }

• FIRST( ( E OP E ) ) = { ‘(‘ }

• FIRST( not E ) = { not }

• No intersection between FIRST sets => can always pick a single rule

• If the FIRST sets intersect, may need longer lookahead

• LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens

• LL(1) is an important and useful class

### Computing FIRST sets

Assume no null productions A 

Initially, for all nonterminalsA, setFIRST(A) = { t | Atω for some ω }

Repeat the following until no changes occur:for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) ∪ FIRST(B)

This is known as fixed-point computation

### FIRST sets computation example

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

### 1. Initialization

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

### 2. Iterate 1

STMT if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

### 2. Iterate 2

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPRTERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

### 2. Iterate 3 – fixed-point

STMT if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

### LL(k) grammars

• A grammar is in the class LL(K) when it can be derived via:

• Top-down derivation

• Scanning the input from left to right (L)

• Producing the leftmost derivation (L)

• With lookahead of k tokens (k)

• For every two productions Aα and Aβ we have FIRST(α) ∩ FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions)

• A language is said to be LL(k) when it has an LL(k) grammar

• What can we do if grammar is not LL(k)?

### LL(k) parsing via pushdown automata

• Pushdown automaton uses

• Prediction stack

• Input stream

• Transition table

• nonterminals x tokens -> production alternative

• Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t

### LL(k) parsing via pushdown automata

• Two possible moves

• Prediction

• When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error

• Match

• When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t

• If (t ≠ T) syntax error

• Parsing terminates when prediction stack is empty

• If input is empty at that point, success. Otherwise, syntax error

### Model of non-recursivepredictive parser

Predictive Parsing program

Stack

Output

Parsing Table

### Example transition table

(1) E → LIT

(2) E → ( E OP E )

(3) E → not E

(4) LIT → true

(5) LIT → false

(6) OP → and

(7) OP → or

(8) OP → xor

Which rule should be used

Input tokens

Nonterminals

aacbb\$

A  aAb | c

abcbb\$

A  aAb | c

### Using top-down parsing approach

• Compute parsing table

• If table is conflict-free then we have an LL(k) parser

• If table contains conflicts investigate

• If grammar is ambiguous try to disambiguate

• Try using left-factoring/substitution/left-recursion elimination to remove conflicts

### Marking “end-of-file”

Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’  S \$where \$ is not part of the set of tokens

To parse an input P with G’ we change it into P\$

Simplifies top-down parsing with null productions and LR parsing

### Bottom-up parsing

• No problem with left recursion

• Widely used in practice

• Shift-reduce parsing: LR(k), SLR, LALR

• All follow the same pushdown-based algorithm

• Read input left-to-right producing rightmost derivation

• Differ on type of “LR Items”

• Parser generator CUP implements LALR

### Some terminology

• The opposite of derivation is called reduction

• Let Aα be a production rule

• Let βAµ be a sentence

• Replace left-hand side of rule in sentence:βAµ=> βαµ

• A handle is a substring that is reduced during a series of steps in a rightmost derivation

### Rightmost derivation example

1 + (2) + (3)

E  E + (E)

E i

E + (2) + (3)

Each non-leaf node represents a handle

E + (E) + (3)

Rightmost derivation in reverse

E + (3)

E

E + (E)

E

E

E

E

E

1

+

(

2

)

+

3

(

)

### LR item

To be matched

Input

N  αβ

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

N  αβ

Shift Item

N  αβ

Reduce Item

### Example

Z  exprEOF

expr  term | expr+ term

term  ID | (expr)

Z  E \$

E  T | E + T

T  i | ( E )

(just shorthand of the grammar on the top)

### Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

i

+

i

\$

Z  E \$

Why do we need these additional LR items?

Where do they come from?

What do they mean?

E  T

E  E + T

T  i

T  (E )

### -closure

{ Z  E \$,

Z  E \$

E  T | E + T

T  i | ( E )

E  T,

-closure({Z  E \$}) =

E  E + T,

T  i ,

T  ( E ) }

Given a set S of LR(0) items

If P  αNβ is in S

then for each rule N  in the grammarS must also contain N 

### Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

i

+

i

\$

Remember position from which we’re trying to reduce

Items denote possible future handles

Z  E \$

E  T

E  E + T

T  i

T  ( E )

### Example: parsing with LR items

Z  E \$

E  T | E + T

T  i | ( E )

i

+

i

\$

Match items with current token

Z  E \$

T  i

Reduce item!

E  T

E  E + T

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

T

+

i

\$

i

Z  E \$

Reduce item!

E  T

E  T

E  E + T

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

E

+

i

\$

T

i

Z  E \$

Reduce item!

E  T

E  T

E  E + T

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

E

+

i

\$

T

i

Z  E \$

Z  E\$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

E

+

i

\$

T

i

Z  E \$

Z  E\$

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

E

+

T

\$

i

T

i

Z  E \$

Z  E\$

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

E

+

T

\$

T

i

i

Reduce item!

Z  E \$

Z  E\$

E  E+T

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

E

\$

E

+

T

T

i

i

Z  E \$

Z  E\$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

E

\$

E

+

T

T

i

Reduce item!

i

Z  E \$

Z  E\$

Z  E\$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

Z  E \$

E  T | E + T

T  i | ( E )

Z

E

\$

E

+

T

Reduce item!

i

T

Z  E \$

i

Z  E\$

Z  E\$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

### Computing item sets

• Initial set

• Z is in the start symbol

• -closure({ Zα | Zαis in the grammar } )

• Next set from a set S and the next symbol X

• step(S,X) = { NαXβ | NαXβ in the item set S}

• nextSet(S,X) = -closure(step(S,X))

reduce state

shift state

q6

E  T

T

T

q7

q0

T  (E)

E  T

E  E + T

T  i

T  (E)

Z  E\$

E  T

E  E + T

T  i

T  (E)

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

Z  E\$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

+

+

\$

)

q9

q2

Z  E\$

T  (E) 

T

q4

E  E + T

### GOTO/ACTION tables

empty – error move

ACTION

Table

GOTO Table

### LR(0) parser tables

• Two types of rows:

• Shift row – tells which state to GOTO for current token

• Reduce row – tells which rule to reduce (independent of current token)

• GOTO entries are blank

### LR parser data structures

+

i

\$

input

stack

q0

i

q5

• Input – remainder of text to be processed

• Stack – sequence of pairs N, qi

• N – symbol (terminal or non-terminal)

• qi – state at which decisions are made

• Initial stack contains q0

### LR(0) pushdown automaton

shift

i

+

i

\$

+

i

\$

input

input

stack

stack

q0

q0

i

q5

• Two moves: shift and reduce

• Shift move

• Remove first token from input

• Push it on the stack

• Compute next state based on GOTO table

• Push new state on the stack

### LR(0) pushdown automaton

ReduceT  i

+

i

\$

+

i

\$

input

input

stack

stack

q0

i

q5

q0

T

q6

• Reduce move

• Using a rule N α

• Symbols in α and their following states are removed from stack

• New state computed based on GOTO table (using top of stack, before pushing N)

• N is pushed on the stack

• New state pushed on top of N

### GOTO/ACTION table

Z  E \$

E  T

E  E + T

T  i

T ( E )

Warning: numbers mean different things!

rn = reduce using rule number n

sm = shift to state m

### GOTO/ACTION table

top is on the right

Z  E \$

E  T

E  E + T

T  i

T ( E )

### Are we done?

Can make a transition diagram for any grammar

Can make a GOTO table for every grammar

Cannot make a deterministic ACTION table for every grammar

### LR(0) conflicts

T

q0

Z  E\$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

T  i[E]

E

Shift/reduce conflict

Z  E \$

E  T

E  E + T

T  i

T  ( E )

T  i[E]

### LR(0) conflicts

T

q0

Z  E\$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

V  i

E

reduce/reduce conflict

Z  E \$

E  T

E  E + T

T  i

V  iT  ( E )

### LR(0) conflicts

• Any grammar with an -rule cannot be LR(0)

• Inherent shift/reduce conflict

• A – reduce item

• PαAβ – shift item

• A can always be predicted from P αAβ

### Back to the GOTO/ACTIONS tables

ACTION

Table

GOTO Table

ACTION table determined only by transition diagram, ignores input

### SRL grammars

A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N

A reduce item N  α is applicable only when the look-ahead is in FOLLOW(N)

Differs from LR(0) only on the ACTION table

### SLR ACTION table

Remember:

In contrast, GOTO table is indexed by state and a grammar symbol from the stack

Z  E \$

E  T

E  E + T

T  i

T ( E )

### SLR ACTION table

vs.

SLR – use 1 token look-ahead

… as before…

T  i

T  i[E]

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q3

R

S → R 

q0

S’ → S

S → L = R

S → R

L → * R

L →  id

R → L

S

q1

S’ → S

q9

S → L = R 

q2

L

S → L = R

R → L 

=

R

q6

S → L = R

R →  L

L → * R

L → id

q5

id

*

L → id 

id

*

id

q4

L → * R

R → L

L → * R

L → id

*

L

q8

R → L 

L

q7

L → * R 

R

### shift/reduce conflict

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

• S → L  = R vs. R → L 

• FOLLOW(R) contains =

• S ⇒ L = R ⇒ * R = R

• SLR cannot resolve the conflict either

### LR(1) grammars

In SLR: a reduce item N  α is applicable only when the look-ahead is in FOLLOW(N)

But FOLLOW(N) merges look-ahead for all alternatives for N

LR(1) keeps look-ahead with each LR item

Idea: a more refined notion of follows computed per item

### LR(1) item

[L → ● id, *]

[L → ● id, =]

[L → ● id, id]

[L → ● id, \$]

[L → id ●, *]

[L → id ●, =]

[L → id ●, id]

[L → id ●, \$]

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

• LR(1) item is a pair

• LR(0) item

• Meaning

• We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token.

• Example

• The production L  id yields the following LR(1) items

### -closure for LR(1)

• For every [A → α ● Bβ , c] in S

• for every production B→δ and every token b in the grammar such that b FIRST(βc)

• Add [B → ● δ , b] to S

q3

q0

(S → R ∙ , \$)

(S’ → ∙ S , \$)

(S → ∙ L = R , \$)

(S → ∙ R , \$)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , \$ )

(L → ∙ id , \$ )

(L → ∙ * R , \$ )

q11

q9

(S → L = R ∙ , \$)

(L → id ∙ , \$)

R

q1

S

(S’ → S ∙ , \$)

id

R

id

L

q2

q12

(S → L ∙ = R , \$)

(R → L ∙ , \$)

(R → L ∙ , \$)

=

id

*

*

q10

(S → L = ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

L

q6

q4

q5

id

(L → id ∙ , \$)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , \$)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , \$)

(L → * R ∙ , \$)

### Back to the conflict

q2

(S → L ∙ = R , \$)

(R → L ∙ , \$)

=

(S → L = ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

q6

Is there a conflict now?

### LALR

LR tables have large number of entries

Often don’t need such refined observation (and cost)

LALR idea: find states with the same LR(0) component and merge their look-ahead component as long as there are no conflicts

LALR not as powerful as LR(1)