Winter
This presentation is the property of its rightful owner.
Sponsored Links
1 / 73

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2 PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on
  • Presentation posted in: General

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review top-down parsing Recursive descent LL(1) parsing Start bottom-up parsing LR(k) SLR(k) LALR(k). Top-down parsing.

Download Presentation

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Winter 2012 2013 compiler principles syntax analysis parsing part 2

Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 2

Mayer Goldberg and Roman Manevich

Ben-Gurion University


Today

Today

  • Review top-down parsing

    • Recursive descent

  • LL(1) parsing

  • Start bottom-up parsing

    • LR(k)

    • SLR(k)

    • LALR(k)


Top down parsing

Top-down parsing

  • Parser repeatedly faces the following problem

    • Given program P starting with terminal t

    • Non-terminal N with list of possible production rules: N α1 … N αk

  • Predict which rule can should be used to derive P


Recursive descent parsing

Recursive descent parsing

  • Define a function for every nonterminal

  • Every function work as follows

    • Find applicable production rule

    • Terminal function checks match with next input token

    • Nonterminal function calls (recursively) other functions

  • If there are several applicable productions for a nonterminal, use lookahead


Boolean expressions example

Boolean expressions example

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

not ( not true or false )

production to apply known from next token

E

E

E =>

notE =>

not ( E OP E ) =>

not ( not E OP E ) =>

not ( not LIT OP E ) =>

not ( not true OP E ) =>

not ( not true or E ) =>

not ( not true or LIT ) =>

not ( not true or false )

not

(

E

OP

E

)

not

LIT

or

LIT

true

false


Flavors of top down parsers

Flavors of top-down parsers

  • Manually constructed

    • Recursive descent (previous lecture, review now)

  • Generated (this lecture)

    • Based on pushdown automata

    • Does not use recursion


Recursive descent parsing1

Recursive descent parsing

  • Define a function for every nonterminal

  • Every function work as follows

    • Find applicable production rule

    • Terminal function checks match with next input token

    • Nonterminal function calls (recursively) other functions

  • If there are several applicable productions for a nonterminal, use lookahead


Matching tokens

Matching tokens

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

match(token t) {

if (current == t)

current = next_token()

else

error

}

Variable current holds the current input token


Functions for nonterminals

Functions for nonterminals

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

E() {

if (current  {TRUE, FALSE}) // E LIT

LIT();

else if (current == LPAREN)// E ( E OP E )

match(LPAREN); E(); OP(); E(); match(RPAREN);

else if (current == NOT)// E not E

match(NOT); E();

else

error;

}

LIT() {

if (current == TRUE) match(TRUE);

else if (current == FALSE) match(FALSE);

else error;

}


Implementation via recursion

Implementation via recursion

E() {

if (current  {TRUE, FALSE})LIT();

else if (current == LPAREN)match(LPARENT);E(); OP(); E();match(RPAREN);

else if (current == NOT)match(NOT); E();

else error;

}

E → LIT

| ( E OP E )

| not E

LIT → true

| false

OP → and

| or

| xor

LIT() {

if (current == TRUE)match(TRUE);

else if (current == FALSE)match(FALSE);

elseerror;

}

OP() {

if (current == AND)match(AND);

else if (current == OR)match(OR);

else if (current == XOR)match(XOR);

elseerror;

}


How is prediction done

How is prediction done?

p. 189

  • For simplicity, let’s assume no null production rules

    • See book for general case

  • Find out the token that can appear first in a rule – FIRST sets


First sets

FIRST sets

  • For every production rule Aα

    • FIRST(α) = all terminals that α can start with

    • Every token that can appear as first in α under some derivation for α

  • In our Boolean expressions example

    • FIRST( LIT ) = { true, false }

    • FIRST( ( E OP E ) ) = { ‘(‘ }

    • FIRST( not E ) = { not }

  • No intersection between FIRST sets => can always pick a single rule

  • If the FIRST sets intersect, may need longer lookahead

    • LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens

    • LL(1) is an important and useful class


Computing first sets

Computing FIRST sets

Assume no null productions A 

Initially, for all nonterminalsA, setFIRST(A) = { t | Atω for some ω }

Repeat the following until no changes occur:for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) ∪ FIRST(B)

This is known as fixed-point computation


First sets computation example

FIRST sets computation example

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant


1 initialization

1. Initialization

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant


2 iterate 1

2. Iterate 1

STMT if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant


2 iterate 2

2. Iterate 2

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPRTERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant


2 iterate 3 fixed point

2. Iterate 3 – fixed-point

STMT if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant


Ll k grammars

LL(k) grammars

  • A grammar is in the class LL(K) when it can be derived via:

    • Top-down derivation

    • Scanning the input from left to right (L)

    • Producing the leftmost derivation (L)

    • With lookahead of k tokens (k)

    • For every two productions Aα and Aβ we have FIRST(α) ∩ FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions)

  • A language is said to be LL(k) when it has an LL(k) grammar

  • What can we do if grammar is not LL(k)?


Ll k parsing via pushdown automata

LL(k) parsing via pushdown automata

  • Pushdown automaton uses

    • Prediction stack

    • Input stream

    • Transition table

      • nonterminals x tokens -> production alternative

      • Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t


Ll k parsing via pushdown automata1

LL(k) parsing via pushdown automata

  • Two possible moves

    • Prediction

      • When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error

    • Match

      • When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t

      • If (t ≠ T) syntax error

  • Parsing terminates when prediction stack is empty

    • If input is empty at that point, success. Otherwise, syntax error


Model of non recursive predictive parser

Model of non-recursivepredictive parser

Predictive Parsing program

Stack

Output

Parsing Table


Example transition table

Example transition table

(1) E → LIT

(2) E → ( E OP E )

(3) E → not E

(4) LIT → true

(5) LIT → false

(6) OP → and

(7) OP → or

(8) OP → xor

Which rule should be used

Input tokens

Nonterminals


Running parser example

Running parser example

aacbb$

A  aAb | c


Illegal input example

Illegal input example

abcbb$

A  aAb | c


Using top down parsing approach

Using top-down parsing approach

  • Compute parsing table

    • If table is conflict-free then we have an LL(k) parser

  • If table contains conflicts investigate

    • If grammar is ambiguous try to disambiguate

    • Try using left-factoring/substitution/left-recursion elimination to remove conflicts


Marking end of file

Marking “end-of-file”

Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’  S $where $ is not part of the set of tokens

To parse an input P with G’ we change it into P$

Simplifies top-down parsing with null productions and LR parsing


Bottom up parsing

Bottom-up parsing

  • No problem with left recursion

  • Widely used in practice

  • Shift-reduce parsing: LR(k), SLR, LALR

    • All follow the same pushdown-based algorithm

    • Read input left-to-right producing rightmost derivation

    • Differ on type of “LR Items”

  • Parser generator CUP implements LALR


Some terminology

Some terminology

  • The opposite of derivation is called reduction

    • Let Aα be a production rule

    • Let βAµ be a sentence

    • Replace left-hand side of rule in sentence:βAµ=> βαµ

  • A handle is a substring that is reduced during a series of steps in a rightmost derivation


Rightmost derivation example

Rightmost derivation example

1 + (2) + (3)

E  E + (E)

E i

E + (2) + (3)

Each non-leaf node represents a handle

E + (E) + (3)

Rightmost derivation in reverse

E + (3)

E

E + (E)

E

E

E

E

E

1

+

(

2

)

+

3

(

)


Lr item

LR item

To be matched

Already matched

Input

N  αβ

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β


Lr items

LR items

N  αβ

Shift Item

N  αβ

Reduce Item


Example

Example

Z  exprEOF

expr  term | expr+ term

term  ID | (expr)

Z  E $

E  T | E + T

T  i | ( E )

(just shorthand of the grammar on the top)


Example parsing with lr items

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

i

+

i

$

Z  E $

Why do we need these additional LR items?

Where do they come from?

What do they mean?

E  T

E  E + T

T  i

T  (E )


Closure

-closure

{ Z  E $,

Z  E $

E  T | E + T

T  i | ( E )

E  T,

-closure({Z  E $}) =

E  E + T,

T  i ,

T  ( E ) }

Given a set S of LR(0) items

If P  αNβ is in S

then for each rule N  in the grammarS must also contain N 


Example parsing with lr items1

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

i

+

i

$

Remember position from which we’re trying to reduce

Items denote possible future handles

Z  E $

E  T

E  E + T

T  i

T  ( E )


Example parsing with lr items2

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

i

+

i

$

Match items with current token

Z  E $

T  i

Reduce item!

E  T

E  E + T

T  i

T  ( E )


Example parsing with lr items3

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

T

+

i

$

i

Z  E $

Reduce item!

E  T

E  T

E  E + T

T  i

T  ( E )


Example parsing with lr items4

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

i

$

T

i

Z  E $

Reduce item!

E  T

E  T

E  E + T

T  i

T  ( E )


Example parsing with lr items5

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

i

$

T

i

Z  E $

Z  E$

E  T

E  E + T

E  E+ T

T  i

T  ( E )


Example parsing with lr items6

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

i

$

T

i

Z  E $

Z  E$

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )


Example parsing with lr items7

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

T

$

i

T

i

Z  E $

Z  E$

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )


Example parsing with lr items8

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

T

$

T

i

i

Reduce item!

Z  E $

Z  E$

E  E+T

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )


Example parsing with lr items9

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

$

E

+

T

T

i

i

Z  E $

Z  E$

E  T

E  E + T

E  E+ T

T  i

T  ( E )


Example parsing with lr items10

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

$

E

+

T

T

i

Reduce item!

i

Z  E $

Z  E$

Z  E$

E  T

E  E + T

E  E+ T

T  i

T  ( E )


Example parsing with lr items11

Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

Z

E

$

E

+

T

Reduce item!

i

T

Z  E $

i

Z  E$

Z  E$

E  T

E  E + T

E  E+ T

T  i

T  ( E )


Computing item sets

Computing item sets

  • Initial set

    • Z is in the start symbol

    • -closure({ Zα | Zαis in the grammar } )

  • Next set from a set S and the next symbol X

    • step(S,X) = { NαXβ | NαXβ in the item set S}

    • nextSet(S,X) = -closure(step(S,X))


Lr 0 automaton example

LR(0) automaton example

reduce state

shift state

q6

E  T

T

T

q7

q0

T  (E)

E  T

E  E + T

T  i

T  (E)

Z  E$

E  T

E  E + T

T  i

T  (E)

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

Z  E$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

+

+

$

)

q9

q2

Z  E$

T  (E) 

T

q4

E  E + T


Goto action table s

GOTO/ACTION tables

empty – error move

ACTION

Table

GOTO Table


Lr 0 parser table s

LR(0) parser tables

  • Two types of rows:

    • Shift row – tells which state to GOTO for current token

    • Reduce row – tells which rule to reduce (independent of current token)

      • GOTO entries are blank


Lr parser data structures

LR parser data structures

+

i

$

input

stack

q0

i

q5

  • Input – remainder of text to be processed

  • Stack – sequence of pairs N, qi

    • N – symbol (terminal or non-terminal)

    • qi – state at which decisions are made

  • Initial stack contains q0


Lr 0 pushdown automaton

LR(0) pushdown automaton

shift

i

+

i

$

+

i

$

input

input

stack

stack

q0

q0

i

q5

  • Two moves: shift and reduce

  • Shift move

    • Remove first token from input

    • Push it on the stack

    • Compute next state based on GOTO table

    • Push new state on the stack

    • If new state is error – report error


Lr 0 pushdown automaton1

LR(0) pushdown automaton

ReduceT  i

+

i

$

+

i

$

input

input

stack

stack

q0

i

q5

q0

T

q6

  • Reduce move

    • Using a rule N α

    • Symbols in α and their following states are removed from stack

    • New state computed based on GOTO table (using top of stack, before pushing N)

    • N is pushed on the stack

    • New state pushed on top of N


Goto action table

GOTO/ACTION table

Z  E $

E  T

E  E + T

T  i

T ( E )

Warning: numbers mean different things!

rn = reduce using rule number n

sm = shift to state m


Goto action table1

GOTO/ACTION table

top is on the right

Z  E $

E  T

E  E + T

T  i

T ( E )


Are we done

Are we done?

Can make a transition diagram for any grammar

Can make a GOTO table for every grammar

Cannot make a deterministic ACTION table for every grammar


Lr 0 conflicts

LR(0) conflicts

T

q0

Z  E$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

T  i[E]

E

Shift/reduce conflict

Z  E $

E  T

E  E + T

T  i

T  ( E )

T  i[E]


Lr 0 conflicts1

LR(0) conflicts

T

q0

Z  E$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

V  i

E

reduce/reduce conflict

Z  E $

E  T

E  E + T

T  i

V  iT  ( E )


Lr 0 conflicts2

LR(0) conflicts

  • Any grammar with an -rule cannot be LR(0)

  • Inherent shift/reduce conflict

    • A – reduce item

    • PαAβ – shift item

    • A can always be predicted from P αAβ


Back to the goto actions table s

Back to the GOTO/ACTIONS tables

ACTION

Table

GOTO Table

ACTION table determined only by transition diagram, ignores input


Srl grammars

SRL grammars

A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N

A reduce item N  α is applicable only when the look-ahead is in FOLLOW(N)

Differs from LR(0) only on the ACTION table


Slr action table

SLR ACTION table

Lookahead token from the input

Remember:

In contrast, GOTO table is indexed by state and a grammar symbol from the stack

Z  E $

E  T

E  E + T

T  i

T ( E )


Slr action table1

SLR ACTION table

vs.

SLR – use 1 token look-ahead

LR(0) – no look-ahead

… as before…

T  i

T  i[E]


Are we done1

Are we done?

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L


Winter 2012 2013 compiler principles syntax analysis parsing part 2

q3

R

S → R 

q0

S’ → S

S → L = R

S → R

L → * R

L →  id

R → L

S

q1

S’ → S

q9

S → L = R 

q2

L

S → L = R

R → L 

=

R

q6

S → L = R

R →  L

L → * R

L → id

q5

id

*

L → id 

id

*

id

q4

L → * R

R → L

L → * R

L → id

*

L

q8

R → L 

L

q7

L → * R 

R


Shift reduce conflict

shift/reduce conflict

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

  • S → L  = R vs. R → L 

  • FOLLOW(R) contains =

    • S ⇒ L = R ⇒ * R = R

  • SLR cannot resolve the conflict either


Lr 1 grammars

LR(1) grammars

In SLR: a reduce item N  α is applicable only when the look-ahead is in FOLLOW(N)

But FOLLOW(N) merges look-ahead for all alternatives for N

LR(1) keeps look-ahead with each LR item

Idea: a more refined notion of follows computed per item


Lr 1 item

LR(1) item

[L → ● id, *]

[L → ● id, =]

[L → ● id, id]

[L → ● id, $]

[L → id ●, *]

[L → id ●, =]

[L → id ●, id]

[L → id ●, $]

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

  • LR(1) item is a pair

    • LR(0) item

    • Look-ahead token

  • Meaning

    • We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token.

  • Example

    • The production L  id yields the following LR(1) items


Closure for lr 1

-closure for LR(1)

  • For every [A → α ● Bβ , c] in S

    • for every production B→δ and every token b in the grammar such that b FIRST(βc)

    • Add [B → ● δ , b] to S


Winter 2012 2013 compiler principles syntax analysis parsing part 2

q3

q0

(S → R ∙ , $)

(S’ → ∙ S , $)

(S → ∙ L = R , $)

(S → ∙ R , $)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , $ )

(L → ∙ id , $ )

(L → ∙ * R , $ )

q11

q9

(S → L = R ∙ , $)

(L → id ∙ , $)

R

q1

S

(S’ → S ∙ , $)

id

R

id

L

q2

q12

(S → L ∙ = R , $)

(R → L ∙ , $)

(R → L ∙ , $)

=

id

*

*

q10

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

L

q6

q4

q5

id

(L → id ∙ , $)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , $)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , $)

(L → * R ∙ , $)


Back to the conflict

Back to the conflict

q2

(S → L ∙ = R , $)

(R → L ∙ , $)

=

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

q6

Is there a conflict now?


Winter 2012 2013 compiler principles syntax analysis parsing part 2

LALR

LR tables have large number of entries

Often don’t need such refined observation (and cost)

LALR idea: find states with the same LR(0) component and merge their look-ahead component as long as there are no conflicts

LALR not as powerful as LR(1)


See you next time

See you next time


  • Login