slide1
Download
Skip this Video
Download Presentation
Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

Loading in 2 Seconds...

play fullscreen
1 / 73

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2 - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review top-down parsing Recursive descent LL(1) parsing Start bottom-up parsing LR(k) SLR(k) LALR(k). Top-down parsing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2' - kovit


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 2

Mayer Goldberg and Roman Manevich

Ben-Gurion University

today
Today
  • Review top-down parsing
    • Recursive descent
  • LL(1) parsing
  • Start bottom-up parsing
    • LR(k)
    • SLR(k)
    • LALR(k)
top down parsing
Top-down parsing
  • Parser repeatedly faces the following problem
    • Given program P starting with terminal t
    • Non-terminal N with list of possible production rules: N α1 … N αk
  • Predict which rule can should be used to derive P
recursive descent parsing
Recursive descent parsing
  • Define a function for every nonterminal
  • Every function work as follows
    • Find applicable production rule
    • Terminal function checks match with next input token
    • Nonterminal function calls (recursively) other functions
  • If there are several applicable productions for a nonterminal, use lookahead
boolean expressions example
Boolean expressions example

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

not ( not true or false )

production to apply known from next token

E

E

E =>

notE =>

not ( E OP E ) =>

not ( not E OP E ) =>

not ( not LIT OP E ) =>

not ( not true OP E ) =>

not ( not true or E ) =>

not ( not true or LIT ) =>

not ( not true or false )

not

(

E

OP

E

)

not

LIT

or

LIT

true

false

flavors of top down parsers
Flavors of top-down parsers
  • Manually constructed
    • Recursive descent (previous lecture, review now)
  • Generated (this lecture)
    • Based on pushdown automata
    • Does not use recursion
recursive descent parsing1
Recursive descent parsing
  • Define a function for every nonterminal
  • Every function work as follows
    • Find applicable production rule
    • Terminal function checks match with next input token
    • Nonterminal function calls (recursively) other functions
  • If there are several applicable productions for a nonterminal, use lookahead
matching tokens
Matching tokens

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

match(token t) {

if (current == t)

current = next_token()

else

error

}

Variable current holds the current input token

functions for nonterminals
Functions for nonterminals

E  LIT | (E OP E) | not E

LIT true|false

OP and | or | xor

E() {

if (current  {TRUE, FALSE}) // E LIT

LIT();

else if (current == LPAREN) // E ( E OP E )

match(LPAREN); E(); OP(); E(); match(RPAREN);

else if (current == NOT) // E not E

match(NOT); E();

else

error;

}

LIT() {

if (current == TRUE) match(TRUE);

else if (current == FALSE) match(FALSE);

else error;

}

implementation via recursion
Implementation via recursion

E() {

if (current  {TRUE, FALSE}) LIT();

else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN);

else if (current == NOT) match(NOT); E();

else error;

}

E → LIT

| ( E OP E )

| not E

LIT → true

| false

OP → and

| or

| xor

LIT() {

if (current == TRUE) match(TRUE);

else if (current == FALSE) match(FALSE);

else error;

}

OP() {

if (current == AND) match(AND);

else if (current == OR) match(OR);

else if (current == XOR) match(XOR);

else error;

}

how is prediction done
How is prediction done?

p. 189

  • For simplicity, let’s assume no null production rules
    • See book for general case
  • Find out the token that can appear first in a rule – FIRST sets
first sets
FIRST sets
  • For every production rule Aα
    • FIRST(α) = all terminals that α can start with
    • Every token that can appear as first in α under some derivation for α
  • In our Boolean expressions example
    • FIRST( LIT ) = { true, false }
    • FIRST( ( E OP E ) ) = { ‘(‘ }
    • FIRST( not E ) = { not }
  • No intersection between FIRST sets => can always pick a single rule
  • If the FIRST sets intersect, may need longer lookahead
    • LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens
    • LL(1) is an important and useful class
computing first sets
Computing FIRST sets

Assume no null productions A 

Initially, for all nonterminalsA, setFIRST(A) = { t | Atω for some ω }

Repeat the following until no changes occur:for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) ∪ FIRST(B)

This is known as fixed-point computation

first sets computation example
FIRST sets computation example

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

1 initialization
1. Initialization

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

2 iterate 1
2. Iterate 1

STMT if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

2 iterate 2
2. Iterate 2

STMT  if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPRTERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

2 iterate 3 fixed point
2. Iterate 3 – fixed-point

STMT if EXPR then STMT

| while EXPR do STMT

| EXPR ;

EXPR  TERM -> id

| zero? TERM

| not EXPR

| ++ id

| -- id

TERM  id

| constant

ll k grammars
LL(k) grammars
  • A grammar is in the class LL(K) when it can be derived via:
    • Top-down derivation
    • Scanning the input from left to right (L)
    • Producing the leftmost derivation (L)
    • With lookahead of k tokens (k)
    • For every two productions Aα and Aβ we have FIRST(α) ∩ FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions)
  • A language is said to be LL(k) when it has an LL(k) grammar
  • What can we do if grammar is not LL(k)?
ll k parsing via pushdown automata
LL(k) parsing via pushdown automata
  • Pushdown automaton uses
    • Prediction stack
    • Input stream
    • Transition table
      • nonterminals x tokens -> production alternative
      • Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t
ll k parsing via pushdown automata1
LL(k) parsing via pushdown automata
  • Two possible moves
    • Prediction
      • When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error
    • Match
      • When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t
      • If (t ≠ T) syntax error
  • Parsing terminates when prediction stack is empty
    • If input is empty at that point, success. Otherwise, syntax error
model of non recursive predictive parser
Model of non-recursivepredictive parser

Predictive Parsing program

Stack

Output

Parsing Table

example transition table
Example transition table

(1) E → LIT

(2) E → ( E OP E )

(3) E → not E

(4) LIT → true

(5) LIT → false

(6) OP → and

(7) OP → or

(8) OP → xor

Which rule should be used

Input tokens

Nonterminals

running parser example
Running parser example

aacbb$

A  aAb | c

illegal input example
Illegal input example

abcbb$

A  aAb | c

using top down parsing approach
Using top-down parsing approach
  • Compute parsing table
    • If table is conflict-free then we have an LL(k) parser
  • If table contains conflicts investigate
    • If grammar is ambiguous try to disambiguate
    • Try using left-factoring/substitution/left-recursion elimination to remove conflicts
marking end of file
Marking “end-of-file”

Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’  S $where $ is not part of the set of tokens

To parse an input P with G’ we change it into P$

Simplifies top-down parsing with null productions and LR parsing

bottom up parsing
Bottom-up parsing
  • No problem with left recursion
  • Widely used in practice
  • Shift-reduce parsing: LR(k), SLR, LALR
    • All follow the same pushdown-based algorithm
    • Read input left-to-right producing rightmost derivation
    • Differ on type of “LR Items”
  • Parser generator CUP implements LALR
some terminology
Some terminology
  • The opposite of derivation is called reduction
    • Let Aα be a production rule
    • Let βAµ be a sentence
    • Replace left-hand side of rule in sentence:βAµ=> βαµ
  • A handle is a substring that is reduced during a series of steps in a rightmost derivation
rightmost derivation example
Rightmost derivation example

1 + (2) + (3)

E  E + (E)

E i

E + (2) + (3)

Each non-leaf node represents a handle

E + (E) + (3)

Rightmost derivation in reverse

E + (3)

E

E + (E)

E

E

E

E

E

1

+

(

2

)

+

3

(

)

lr item
LR item

To be matched

Already matched

Input

N  αβ

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

lr items
LR items

N  αβ

Shift Item

N  αβ

Reduce Item

example
Example

Z  exprEOF

expr  term | expr+ term

term  ID | (expr)

Z  E $

E  T | E + T

T  i | ( E )

(just shorthand of the grammar on the top)

example parsing with lr items
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

i

+

i

$

Z  E $

Why do we need these additional LR items?

Where do they come from?

What do they mean?

E  T

E  E + T

T  i

T  (E )

closure
-closure

{ Z  E $,

Z  E $

E  T | E + T

T  i | ( E )

E  T,

-closure({Z  E $}) =

E  E + T,

T  i ,

T  ( E ) }

Given a set S of LR(0) items

If P  αNβ is in S

then for each rule N  in the grammarS must also contain N 

example parsing with lr items1
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

i

+

i

$

Remember position from which we’re trying to reduce

Items denote possible future handles

Z  E $

E  T

E  E + T

T  i

T  ( E )

example parsing with lr items2
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

i

+

i

$

Match items with current token

Z  E $

T  i

Reduce item!

E  T

E  E + T

T  i

T  ( E )

example parsing with lr items3
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

T

+

i

$

i

Z  E $

Reduce item!

E  T

E  T

E  E + T

T  i

T  ( E )

example parsing with lr items4
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

i

$

T

i

Z  E $

Reduce item!

E  T

E  T

E  E + T

T  i

T  ( E )

example parsing with lr items5
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

i

$

T

i

Z  E $

Z  E$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

example parsing with lr items6
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

i

$

T

i

Z  E $

Z  E$

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

example parsing with lr items7
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

T

$

i

T

i

Z  E $

Z  E$

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

example parsing with lr items8
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

+

T

$

T

i

i

Reduce item!

Z  E $

Z  E$

E  E+T

E  E+T

E  T

T  i

E  E + T

E  E+ T

T  ( E )

T  i

T  ( E )

example parsing with lr items9
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

$

E

+

T

T

i

i

Z  E $

Z  E$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

example parsing with lr items10
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

E

$

E

+

T

T

i

Reduce item!

i

Z  E $

Z  E$

Z  E$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

example parsing with lr items11
Example: parsing with LR items

Z  E $

E  T | E + T

T  i | ( E )

Z

E

$

E

+

T

Reduce item!

i

T

Z  E $

i

Z  E$

Z  E$

E  T

E  E + T

E  E+ T

T  i

T  ( E )

computing item sets
Computing item sets
  • Initial set
    • Z is in the start symbol
    • -closure({ Zα | Zαis in the grammar } )
  • Next set from a set S and the next symbol X
    • step(S,X) = { NαXβ | NαXβ in the item set S}
    • nextSet(S,X) = -closure(step(S,X))
lr 0 automaton example
LR(0) automaton example

reduce state

shift state

q6

E  T

T

T

q7

q0

T  (E)

E  T

E  E + T

T  i

T  (E)

Z  E$

E  T

E  E + T

T  i

T  (E)

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

Z  E$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

+

+

$

)

q9

q2

Z  E$

T  (E) 

T

q4

E  E + T

goto action table s
GOTO/ACTION tables

empty – error move

ACTION

Table

GOTO Table

lr 0 parser table s
LR(0) parser tables
  • Two types of rows:
    • Shift row – tells which state to GOTO for current token
    • Reduce row – tells which rule to reduce (independent of current token)
      • GOTO entries are blank
lr parser data structures
LR parser data structures

+

i

$

input

stack

q0

i

q5

  • Input – remainder of text to be processed
  • Stack – sequence of pairs N, qi
    • N – symbol (terminal or non-terminal)
    • qi – state at which decisions are made
  • Initial stack contains q0
lr 0 pushdown automaton
LR(0) pushdown automaton

shift

i

+

i

$

+

i

$

input

input

stack

stack

q0

q0

i

q5

  • Two moves: shift and reduce
  • Shift move
    • Remove first token from input
    • Push it on the stack
    • Compute next state based on GOTO table
    • Push new state on the stack
    • If new state is error – report error
lr 0 pushdown automaton1
LR(0) pushdown automaton

ReduceT  i

+

i

$

+

i

$

input

input

stack

stack

q0

i

q5

q0

T

q6

  • Reduce move
    • Using a rule N α
    • Symbols in α and their following states are removed from stack
    • New state computed based on GOTO table (using top of stack, before pushing N)
    • N is pushed on the stack
    • New state pushed on top of N
goto action table
GOTO/ACTION table

Z  E $

E  T

E  E + T

T  i

T ( E )

Warning: numbers mean different things!

rn = reduce using rule number n

sm = shift to state m

goto action table1
GOTO/ACTION table

top is on the right

Z  E $

E  T

E  E + T

T  i

T ( E )

are we done
Are we done?

Can make a transition diagram for any grammar

Can make a GOTO table for every grammar

Cannot make a deterministic ACTION table for every grammar

lr 0 conflicts
LR(0) conflicts

T

q0

Z  E$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

T  i[E]

E

Shift/reduce conflict

Z  E $

E  T

E  E + T

T  i

T  ( E )

T  i[E]

lr 0 conflicts1
LR(0) conflicts

T

q0

Z  E$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

V  i

E

reduce/reduce conflict

Z  E $

E  T

E  E + T

T  i

V  iT  ( E )

lr 0 conflicts2
LR(0) conflicts
  • Any grammar with an -rule cannot be LR(0)
  • Inherent shift/reduce conflict
    • A – reduce item
    • PαAβ – shift item
    • A can always be predicted from P αAβ
back to the goto actions table s
Back to the GOTO/ACTIONS tables

ACTION

Table

GOTO Table

ACTION table determined only by transition diagram, ignores input

srl grammars
SRL grammars

A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N

A reduce item N  α is applicable only when the look-ahead is in FOLLOW(N)

Differs from LR(0) only on the ACTION table

slr action table
SLR ACTION table

Lookahead token from the input

Remember:

In contrast, GOTO table is indexed by state and a grammar symbol from the stack

Z  E $

E  T

E  E + T

T  i

T ( E )

slr action table1
SLR ACTION table

vs.

SLR – use 1 token look-ahead

LR(0) – no look-ahead

… as before…

T  i

T  i[E]

are we done1
Are we done?

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

slide65

q3

R

S → R 

q0

S’ → S

S → L = R

S → R

L → * R

L →  id

R → L

S

q1

S’ → S

q9

S → L = R 

q2

L

S → L = R

R → L 

=

R

q6

S → L = R

R →  L

L → * R

L → id

q5

id

*

L → id 

id

*

id

q4

L → * R

R → L

L → * R

L → id

*

L

q8

R → L 

L

q7

L → * R 

R

shift reduce conflict
shift/reduce conflict

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

  • S → L  = R vs. R → L 
  • FOLLOW(R) contains =
    • S ⇒ L = R ⇒ * R = R
  • SLR cannot resolve the conflict either
lr 1 grammars
LR(1) grammars

In SLR: a reduce item N  α is applicable only when the look-ahead is in FOLLOW(N)

But FOLLOW(N) merges look-ahead for all alternatives for N

LR(1) keeps look-ahead with each LR item

Idea: a more refined notion of follows computed per item

lr 1 item
LR(1) item

[L → ● id, *]

[L → ● id, =]

[L → ● id, id]

[L → ● id, $]

[L → id ●, *]

[L → id ●, =]

[L → id ●, id]

[L → id ●, $]

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

  • LR(1) item is a pair
    • LR(0) item
    • Look-ahead token
  • Meaning
    • We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token.
  • Example
    • The production L  id yields the following LR(1) items
closure for lr 1
-closure for LR(1)
  • For every [A → α ● Bβ , c] in S
    • for every production B→δ and every token b in the grammar such that b FIRST(βc)
    • Add [B → ● δ , b] to S
slide70

q3

q0

(S → R ∙ , $)

(S’ → ∙ S , $)

(S → ∙ L = R , $)

(S → ∙ R , $)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , $ )

(L → ∙ id , $ )

(L → ∙ * R , $ )

q11

q9

(S → L = R ∙ , $)

(L → id ∙ , $)

R

q1

S

(S’ → S ∙ , $)

id

R

id

L

q2

q12

(S → L ∙ = R , $)

(R → L ∙ , $)

(R → L ∙ , $)

=

id

*

*

q10

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

L

q6

q4

q5

id

(L → id ∙ , $)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , $)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , $)

(L → * R ∙ , $)

back to the conflict
Back to the conflict

q2

(S → L ∙ = R , $)

(R → L ∙ , $)

=

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

q6

Is there a conflict now?

slide72
LALR

LR tables have large number of entries

Often don’t need such refined observation (and cost)

LALR idea: find states with the same LR(0) component and merge their look-ahead component as long as there are no conflicts

LALR not as powerful as LR(1)

ad