Winter
Download
1 / 81

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3 - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Shift-reduce parsing How does a shift-reduce parser work? How do we construct a shift-reduce parse table? LR(1), SLR(1), LALR(1)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3' - gilles


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 3

Mayer Goldberg and Roman Manevich

Ben-Gurion University


Today
Today

  • Shift-reduce parsing

    • How does a shift-reduce parser work?

    • How do we construct a shift-reduce parse table?

    • LR(1), SLR(1), LALR(1)

    • Meaning of shift-reduce / reduce-reduce conflicts

    • Behavior on left-recursion/right-recursion

  • Automatic parser generation (CUP)

    • Handling ambiguity


Model of an lr parser
Model of an LR parser

Input

Stack

LRParsing program

state

Output

symbol


Lr parser stack
LR parser stack

Sequence made of state, symbol pairs

For instance a possible stack for the grammarS  E $E  TE  E+ TT idT  (E)could be: 0 T 2 + 7 id 5


Form of lr parsing table
Form of LR parsing table

non-terminals

state

terminals

0

rk

gm

1

acc

...

gotopart

shift/reduceactions

sn

error

shift state n

reduce by rule k

gotostate m

accept


Lr parser table example
LR parser table example

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Shift move
Shift move

Input

Stack

LRParsing program

Output

If action[q, a] = sn


Result of shift
Result of shift

Input

Stack

LRParsing program

Output

If action[q, a] = sn


Reduce move
Reduce move

Input

Stack

LRParsing program

Output

2*|β|

  • If action[qn, a] = rk

  • Production: (k) A β

  • If β= σ1… σnTop of stack looks like q1 σ1…qnσn

  • goto[q, A] = qm


Result of reduce move
Result of reduce move

Input

Stack

LRParsing program

Output

2*|β|

  • If action[qn, a] = rk

  • Production: (k) A β

  • If β= σ1… σnTop of stack looks like q1 σ1…qnσn

  • goto[q, A] = qm


Accept move
Accept move

Input

Stack

LRParsing program

Output

If action[q, a] = accept

parsing completed


Error move
Error move

Input

Stack

LRParsing program

Output

  • If action[q, a] = error

  • parsing discovered a syntactic error


Parsing id id
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

Initialize with state 0


Parsing id id1
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

Initialize with state 0


Parsing id id2
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id3
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

pop id 5


Parsing id id4
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

push T 6


Parsing id id5
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id6
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id7
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id8
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id9
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id10
Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Constructing an lr parsing table
Constructing an LR parsing table

Construct a (determinized) transition diagram from LR items

If there are conflicts – stop

Fill table entries from diagram


Form of lr 0 items
Form of LR(0) items

To be matched

Already matched

Input

N  αβ

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β


Types of lr 0 items
Types of LR(0) items

N  αβ

Shift Item

N  αβ

Reduce Item


Lr 0 items enumeration example
LR(0) items enumeration example

LR(0) items

Grammar

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

All items can be obtained by placing a dot at every position for every production:


Operations for transition diagram construction
Operations for transition diagram construction

Initial = {S’S$}

For an item set IClosure(I) = Closure(I) + {Xµ is in grammar| NαXβ in I}

Goto(I, X) = { NαXβ | NαXβ in I}


Initial example
Initial example

Grammar

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E $}


Closure example
Closure example

Grammar

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E $}

Closure({S E $}) =S E $E TE E + TT id T  ( E )


Goto example
Goto example

Grammar

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E $}

Closure({S E $}) =S E $E TE E + TT id T  ( E )

Goto({S E $ , E E + T, T id}, E) = {S  E $, E  E + T}


Constructing the transition diagram
Constructing the transition diagram

  • Start with state 0 containing itemClosure({S E $})

  • Repeat until no new states are discovered

    • For every state p containing item set Ip, and symbol N, compute state q containing item setIq = Closure(goto(Ip, N))


Automaton construction example
Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

q7

T

q0

T

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E$

E  T

E  E + T

T  i

T  (E)

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

S  E$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

+

+

$

)

q9

q2

S  E$

T  (E) 

T

q4

E  E + T


Automaton construction example1
Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q0

S  E$

Initialize


Automaton construction example2
Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q0

S  E$

E  T

E  E + T

T  i

T  (E)

applyClosure


Automaton construction example3
Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

T

q0

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E$

E  T

E  E + T

T  i

T  (E)

(

q5

i

T  i

E

q1

S  E$

E  E+ T


Automaton construction example4
Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

q7

T

q0

T

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E$

E  T

E  E + T

T  i

T  (E)

non-terminal transition corresponds to goto action in parse table

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

Z  E$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

terminal transition corresponds to shift action in parse table

+

+

$

)

q9

q2

S  E$

T  (E) 

T

q4

E  E + T

a single reduce item corresponds to reduce action


Conflicts
Conflicts

Can construct a diagram for every grammar but some may introduce conflicts

shift-reduce conflict: an item set contains at least one shift item and one reduce item

reduce-reduce conflict: an item set contains two reduce items


Lr 0 conflicts
LR(0) conflicts

T

q0

S  E$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

T  i[E]

E

Shift/reduce conflict

S  E $

E  T

E  E + T

T  i

T  ( E )

T  i[E]


Lr 0 conflicts1
LR(0) conflicts

T

q0

S  E$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

V  i

E

reduce/reduce conflict

S  E $

E  T

E  E + T

T  i

V  iT  ( E )


Lr 0 conflicts2
LR(0) conflicts

  • Any grammar with an -rule cannot be LR(0)

  • Inherent shift/reduce conflict

    • A – reduce item

    • PαAβ – shift item

    • A can always be predicted from P αAβ


Lr variants
LR variants

  • LR(0) – what we’ve seen so far

  • SLR(0)

    • Removes infeasible reduce actions via FOLLOW set reasoning

  • LR(1)

    • LR(0) with one lookahead token in items

  • LALR(0)

    • LR(1) with merging of states with same LR(0) component


Srl parsing
SRL parsing

  • A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N

  • A reduce item N  α is applicable only when the lookahead is in FOLLOW(N)

    • If b is not in FOLLOW(N) we just proved there is no derivation S =>* βNαband thus it is safe to remove the reduce item from the conflicted state

  • Differs from LR(0) only on the ACTION table

    • Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take


Slr action table
SLR action table

Lookahead token from the input

vs.

SLR – use 1 token look-ahead

LR(0) – no look-ahead

… as before…

T  i

T  i[E]


Going beyond slr 0
Going beyond SLR(0)

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

Some common language constructs introduce conflicts even for SLR


q3

R

S → R 

q0

S’ → S

S → L = R

S → R

L → * R

L →  id

R → L

S

q1

S’ → S

q9

S → L = R 

q2

L

S → L = R

R → L 

=

R

q6

S → L = R

R →  L

L → * R

L → id

q5

id

*

L → id 

id

*

id

q4

L → * R

R → L

L → * R

L → id

*

L

q8

R → L 

L

q7

L → * R 

R


Shift reduce conflict
shift/reduce conflict

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

  • S → L  = R vs. R → L 

  • FOLLOW(R) contains =

    • S ⇒ L = R ⇒ * R = R

  • SLR cannot resolve conflict


Inputs requiring shift reduce
Inputs requiring shift/reduce

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

For the input id the rightmost derivationS’ => S => R => L => id requires reducing in q2

For the input id = idS’ => S => L = R => L = L => L = id => id = idrequires shifting


Lr 1 grammars
LR(1) grammars

  • In SLR: a reduce item N  α is applicable only when the lookahead is in FOLLOW(N)

  • But FOLLOW(N) merges lookahead for all alternatives for N

    • Insensitive to the context of a given production

  • LR(1) keeps lookahead with each LR item

  • Idea: a more refined notion of follows computed per item


Lr 1 items
LR(1) items

LR(1) items

[L → ● id, *]

[L → ● id, =]

[L → ● id, id]

[L → ● id, $]

[L → id ●, *]

[L → id ●, =]

[L → id ●, id]

[L → id ●, $]

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

LR(0) items

[L → ● id]

[L → id ●]

  • LR(1) item is a pair

    • LR(0) item

    • Lookahead token

  • Meaning

    • We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token

  • Example

    • The production L  id yields the following LR(1) items


Closure for lr 1
-closure for LR(1)

  • For every [A → α ● Bβ , c] in S

    • for every production B→δ and every token b in the grammar such that b FIRST(βc)

    • Add [B → ● δ , b] to S


q3

q0

(S → R ∙ , $)

(S’ → ∙ S , $)

(S → ∙ L = R , $)

(S → ∙ R , $)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , $ )

(L → ∙ id , $ )

(L → ∙ * R , $ )

q11

q9

(S → L = R ∙ , $)

(L → id ∙ , $)

R

q1

S

(S’ → S ∙ , $)

id

R

id

L

q2

q12

(S → L ∙ = R , $)

(R → L ∙ , $)

(R → L ∙ , $)

=

id

*

*

q10

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

L

q6

q4

q5

id

(L → id ∙ , $)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , $)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , $)

(L → * R ∙ , $)


Back to the conflict
Back to the conflict

q2

(S → L ∙ = R , $)

(R → L ∙ , $)

=

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

q6

Is there a conflict now?


Lalr 1
LALR(1)

  • LR(1) tables have huge number of entries

  • Often don’t need such refined observation (and cost)

  • Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts

  • LALR(1) not as powerful as LR(1) in theory but works quite well in practice

    • Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice


q3

q0

(S → R ∙ , $)

(S’ → ∙ S , $)

(S → ∙ L = R , $)

(S → ∙ R , $)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , $ )

(L → ∙ id , $ )

(L → ∙ * R , $ )

q11

q9

(S → L = R ∙ , $)

(L → id ∙ , $)

R

q1

S

(S’ → S ∙ , $)

id

R

id

L

q2

q12

(S → L ∙ = R , $)

(R → L ∙ , $)

(R → L ∙ , $)

=

id

*

*

q10

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

L

q6

q4

q5

id

(L → id ∙ , $)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , $)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , $)

(L → * R ∙ , $)


q3

q0

(S → R ∙ , $)

(S’ → ∙ S , $)

(S → ∙ L = R , $)

(S → ∙ R , $)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , $ )

(L → ∙ id , $ )

(L → ∙ * R , $ )

q9

(S → L = R ∙ , $)

R

q1

S

(S’ → S ∙ , $)

R

L

q2

(S → L ∙ = R , $)

(R → L ∙ , $)

=

id

*

*

id

q10

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

q6

q4

q5

id

(L → id ∙ , $)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

*

id

L

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , $)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , $)

(L → * R ∙ , $)


Left right recursion
Left/Right- recursion

  • At home: create a simple grammar with left-recursion and one with right-recursion

  • Construct corresponding LR(0) parser

    • Any conflicts?

  • Run on simple input and observe behavior

    • Attempt to generalize observation for long inputs



High level structure
High-level structure

text

Lexerspec

.java

Lexical analyzer

JFlex

javac

Lexer.java

TPL.lex

(Token.java)

tokens

Parserspec

.java

Parser

CUP

javac

sym.javaParser.java

TPL.cup

AST


Expression calculator
Expression calculator

exprexpr + expr

| expr - expr

| expr * expr

| expr / expr

| - expr

| ( expr )

| number

  • Goals of expression calculator parser:

  • Is 2+3+4+5 a valid expression?

  • What is the meaning (value) of this expression?


Syntax analysis with cup
Syntax analysis with CUP

  • CUP – parser generator

  • Generates an LALR(1) Parser

  • Input: spec file

  • Output: a syntax analyzer

tokens

Parserspec

.java

Parser

CUP

javac

AST


Cup spec file
CUP spec file

  • Package and import specifications

  • User code components

  • Symbol (terminal and non-terminal) lists

    • Terminals go to sym.java

    • Types of AST nodes

  • Precedence declarations

  • The grammar

    • Semantic actions to construct AST


Expression calculator 1 st attempt

Symbol typeexplained later

Expression Calculator – 1st Attempt

terminal Integer NUMBER;

terminal PLUS, MINUS, MULT, DIV;

terminal LPAREN, RPAREN;

non terminal Integer expr;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr

| LPAREN expr RPAREN

| NUMBER

;


Ambiguities

+

*

+

+

+

+

*

+

a

a

a

a

b

b

c

c

b

b

c

c

Ambiguities

a * b + c

a + b + c


Expression calculator 2 nd attempt

Increasing

precedence

Expression Calculator – 2nd Attempt

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV;

terminal LPAREN, RPAREN;

terminal UMINUS;

non terminal Integer expr;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;

Contextual precedence


Parsing ambiguous grammars using precedence declarations
Parsing ambiguous grammars using precedence declarations

  • Each terminal assigned with precedence

    • By default all terminals have lowest precedence

    • User can assign his own precedence

    • CUP assigns each production a precedence

      • Precedence of last terminal in production

      • or user-specified contextual precedence

  • On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce

  • In case of equal precedences left/right help resolve conflicts

    • left means reduce

    • right means shift

  • More information on precedence declarations in CUP’s manual


Resolving ambiguity

+

+

+

+

a

a

b

c

b

c

Resolving ambiguity

precedence left PLUS

a + b + c


Resolving ambiguity1

*

+

+

*

a

a

b

c

b

c

Resolving ambiguity

precedence left PLUSprecedence left MULT

a * b + c


Resolving ambiguity2
Resolving ambiguity

MINUS expr %prec UMINUS

-

-

-

-

b

a

b

a

- a - b


Resolving ambiguity3
Resolving ambiguity

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV;

terminal LPAREN, RPAREN;

terminal UMINUS;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;

UMINUS never returnedby scanner(used only to define precedence)

Rule has precedence of UMINUS


More cup directives
More CUP directives

  • precedence nonassoc NEQ

    • Non-associative operators: < > == != etc.

    • 1<2<3 identified as an error (semantic error?)

  • start non-terminal

    • Specifies start non-terminal other than first non-terminal

    • Can change to test parts of grammar

  • Getting internal representation

    • Command line options:

      • -dump_grammar

      • -dump_states

      • -dump_tables

      • -dump


Scanner integration

Generated from tokendeclarations in .cup file

Scanner integration

import java_cup.runtime.*;

%%

%cup

%eofval{

return new Symbol(sym.EOF);

%eofval}

NUMBER=[0-9]+

%%

<YYINITIAL>”+” { return new Symbol(sym.PLUS); }

<YYINITIAL>”-” { return new Symbol(sym.MINUS); }

<YYINITIAL>”*” { return new Symbol(sym.MULT); }

<YYINITIAL>”/” { return new Symbol(sym.DIV); }

<YYINITIAL>”(” { return new Symbol(sym.LPAREN); }

<YYINITIAL>”)” { return new Symbol(sym.RPAREN); }

<YYINITIAL>{NUMBER} {

return new Symbol(sym.NUMBER, new Integer(yytext()));

}

<YYINITIAL>\n { }

<YYINITIAL>. { }

Parser gets terminals from the scanner


Recap
Recap

  • Package and import specifications and user code components

  • Symbol (terminal and non-terminal) lists

    • Define building-blocks of the grammar

  • Precedence declarations

    • May help resolve conflicts

  • The grammar

    • May introduce conflicts that have to be resolved


Assigning meaning
Assigning meaning

  • So far, only validation

  • Add Java code implementing semantic actions

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;


Assigning meaning1
Assigning meaning

  • Symbol labels used to name variables

  • RESULT names the left-hand side symbol

expr ::= expr:e1 PLUS expr:e2

{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}

| expr:e1 MINUS expr:e2

{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}

| expr:e1 MULT expr:e2

{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}

| expr:e1 DIV expr:e2

{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}

| MINUS expr:e1

{: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS

| LPAREN expr:e1 RPAREN

{: RESULT = e1; :}

| NUMBER:n

{: RESULT = n; :}

;


Building an ast
Building an AST

  • More useful representation of syntax tree

    • Less clutter

    • Actual level of detail depends on your design

  • Basis for semantic analysis

  • Later annotated with various information

    • Type information

    • Computed values


Parse tree vs ast
Parse tree vs. AST

expr

+

expr

+

expr

expr

expr

1

+

(

2

)

+

(

3

)

1

2

3


Ast construction
AST construction

  • AST Nodes constructed during parsing

    • Stored in push-down stack

  • Bottom-up parser

    • Grammar rules annotated with actions for AST construction

    • When node is constructed all children available (already constructed)

    • Node (RESULT) pushed on stack

  • Top-down parser

    • More complicated


Ast construction1

int_const

int_const

int_const

val = 3

val = 2

val = 1

plus

plus

e1

e1

e2

e2

AST construction

expr ::= expr:e1 PLUS expr:e2

{: RESULT = new plus(e1,e2); :}

| LPAREN expr:e RPAREN

{: RESULT = e; :}

| INT_CONST:i

{: RESULT = new int_const(…, i); :}

1 + (2) + (3)

expr + (2) + (3)

expr + (expr) + (3)

expr + (3)

expr + (expr)

expr

expr

expr

expr

expr

expr

1

+

(

2

)

+

(

3

)


Designing an ast
Designing an AST

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI;

terminal UMINUS;

non terminal Integer expr;

non terminal expr_list, expr_part;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr_list ::= expr_list expr_part

| expr_part

;

expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI

;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;



ad