Winter
This presentation is the property of its rightful owner.
Sponsored Links
1 / 81

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3 PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on
  • Presentation posted in: General

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Shift-reduce parsing How does a shift-reduce parser work? How do we construct a shift-reduce parse table? LR(1), SLR(1), LALR(1)

Download Presentation

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Winter 2012 2013 compiler principles syntax analysis parsing part 3

Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 3

Mayer Goldberg and Roman Manevich

Ben-Gurion University


Today

Today

  • Shift-reduce parsing

    • How does a shift-reduce parser work?

    • How do we construct a shift-reduce parse table?

    • LR(1), SLR(1), LALR(1)

    • Meaning of shift-reduce / reduce-reduce conflicts

    • Behavior on left-recursion/right-recursion

  • Automatic parser generation (CUP)

    • Handling ambiguity


Model of an lr parser

Model of an LR parser

Input

Stack

LRParsing program

state

Output

symbol


Lr parser stack

LR parser stack

Sequence made of state, symbol pairs

For instance a possible stack for the grammarS  E $E  TE  E+ TT idT  (E)could be: 0 T 2 + 7 id 5


Form of lr parsing table

Form of LR parsing table

non-terminals

state

terminals

0

rk

gm

1

acc

...

gotopart

shift/reduceactions

sn

error

shift state n

reduce by rule k

gotostate m

accept


Lr parser table example

LR parser table example

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Shift move

Shift move

Input

Stack

LRParsing program

Output

If action[q, a] = sn


Result of shift

Result of shift

Input

Stack

LRParsing program

Output

If action[q, a] = sn


Reduce move

Reduce move

Input

Stack

LRParsing program

Output

2*|β|

  • If action[qn, a] = rk

  • Production: (k) A β

  • If β= σ1… σnTop of stack looks like q1 σ1…qnσn

  • goto[q, A] = qm


Result of reduce move

Result of reduce move

Input

Stack

LRParsing program

Output

2*|β|

  • If action[qn, a] = rk

  • Production: (k) A β

  • If β= σ1… σnTop of stack looks like q1 σ1…qnσn

  • goto[q, A] = qm


Accept move

Accept move

Input

Stack

LRParsing program

Output

If action[q, a] = accept

parsing completed


Error move

Error move

Input

Stack

LRParsing program

Output

  • If action[q, a] = error

  • parsing discovered a syntactic error


Parsing id id

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

Initialize with state 0


Parsing id id1

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

Initialize with state 0


Parsing id id2

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id3

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

pop id 5


Parsing id id4

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

push T 6


Parsing id id5

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id6

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id7

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id8

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id9

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Parsing id id10

Parsing id+id$

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)


Constructing an lr parsing table

Constructing an LR parsing table

Construct a (determinized) transition diagram from LR items

If there are conflicts – stop

Fill table entries from diagram


Form of lr 0 items

Form of LR(0) items

To be matched

Already matched

Input

N  αβ

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β


Types of lr 0 items

Types of LR(0) items

N  αβ

Shift Item

N  αβ

Reduce Item


Lr 0 items enumeration example

LR(0) items enumeration example

LR(0) items

Grammar

(1) S E $

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

All items can be obtained by placing a dot at every position for every production:


Operations for transition diagram construction

Operations for transition diagram construction

Initial = {S’S$}

For an item set IClosure(I) = Closure(I) + {Xµ is in grammar| NαXβ in I}

Goto(I, X) = { NαXβ | NαXβ in I}


Initial example

Initial example

Grammar

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E $}


Closure example

Closure example

Grammar

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E $}

Closure({S E $}) =S E $E TE E + TT id T  ( E )


Goto example

Goto example

Grammar

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E $}

Closure({S E $}) =S E $E TE E + TT id T  ( E )

Goto({S E $ , E E + T, T id}, E) = {S  E $, E  E + T}


Constructing the transition diagram

Constructing the transition diagram

  • Start with state 0 containing itemClosure({S E $})

  • Repeat until no new states are discovered

    • For every state p containing item set Ip, and symbol N, compute state q containing item setIq = Closure(goto(Ip, N))


Automaton construction example

Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

q7

T

q0

T

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E$

E  T

E  E + T

T  i

T  (E)

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

S  E$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

+

+

$

)

q9

q2

S  E$

T  (E) 

T

q4

E  E + T


Automaton construction example1

Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q0

S  E$

Initialize


Automaton construction example2

Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q0

S  E$

E  T

E  E + T

T  i

T  (E)

applyClosure


Automaton construction example3

Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

T

q0

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E$

E  T

E  E + T

T  i

T  (E)

(

q5

i

T  i

E

q1

S  E$

E  E+ T


Automaton construction example4

Automaton construction example

(1) S E $

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

q7

T

q0

T

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E$

E  T

E  E + T

T  i

T  (E)

non-terminal transition corresponds to goto action in parse table

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

Z  E$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

terminal transition corresponds to shift action in parse table

+

+

$

)

q9

q2

S  E$

T  (E) 

T

q4

E  E + T

a single reduce item corresponds to reduce action


Conflicts

Conflicts

Can construct a diagram for every grammar but some may introduce conflicts

shift-reduce conflict: an item set contains at least one shift item and one reduce item

reduce-reduce conflict: an item set contains two reduce items


Lr 0 conflicts

LR(0) conflicts

T

q0

S  E$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

T  i[E]

E

Shift/reduce conflict

S  E $

E  T

E  E + T

T  i

T  ( E )

T  i[E]


Lr 0 conflicts1

LR(0) conflicts

T

q0

S  E$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

V  i

E

reduce/reduce conflict

S  E $

E  T

E  E + T

T  i

V  iT  ( E )


Lr 0 conflicts2

LR(0) conflicts

  • Any grammar with an -rule cannot be LR(0)

  • Inherent shift/reduce conflict

    • A – reduce item

    • PαAβ – shift item

    • A can always be predicted from P αAβ


Lr variants

LR variants

  • LR(0) – what we’ve seen so far

  • SLR(0)

    • Removes infeasible reduce actions via FOLLOW set reasoning

  • LR(1)

    • LR(0) with one lookahead token in items

  • LALR(0)

    • LR(1) with merging of states with same LR(0) component


Srl parsing

SRL parsing

  • A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N

  • A reduce item N  α is applicable only when the lookahead is in FOLLOW(N)

    • If b is not in FOLLOW(N) we just proved there is no derivation S =>* βNαband thus it is safe to remove the reduce item from the conflicted state

  • Differs from LR(0) only on the ACTION table

    • Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take


Slr action table

SLR action table

Lookahead token from the input

vs.

SLR – use 1 token look-ahead

LR(0) – no look-ahead

… as before…

T  i

T  i[E]


Going beyond slr 0

Going beyond SLR(0)

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

Some common language constructs introduce conflicts even for SLR


Winter 2012 2013 compiler principles syntax analysis parsing part 3

q3

R

S → R 

q0

S’ → S

S → L = R

S → R

L → * R

L →  id

R → L

S

q1

S’ → S

q9

S → L = R 

q2

L

S → L = R

R → L 

=

R

q6

S → L = R

R →  L

L → * R

L → id

q5

id

*

L → id 

id

*

id

q4

L → * R

R → L

L → * R

L → id

*

L

q8

R → L 

L

q7

L → * R 

R


Shift reduce conflict

shift/reduce conflict

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

  • S → L  = R vs. R → L 

  • FOLLOW(R) contains =

    • S ⇒ L = R ⇒ * R = R

  • SLR cannot resolve conflict


Inputs requiring shift reduce

Inputs requiring shift/reduce

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

For the input id the rightmost derivationS’ => S => R => L => id requires reducing in q2

For the input id = idS’ => S => L = R => L = L => L = id => id = idrequires shifting


Lr 1 grammars

LR(1) grammars

  • In SLR: a reduce item N  α is applicable only when the lookahead is in FOLLOW(N)

  • But FOLLOW(N) merges lookahead for all alternatives for N

    • Insensitive to the context of a given production

  • LR(1) keeps lookahead with each LR item

  • Idea: a more refined notion of follows computed per item


Lr 1 items

LR(1) items

LR(1) items

[L → ● id, *]

[L → ● id, =]

[L → ● id, id]

[L → ● id, $]

[L → id ●, *]

[L → id ●, =]

[L → id ●, id]

[L → id ●, $]

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

LR(0) items

[L → ● id]

[L → id ●]

  • LR(1) item is a pair

    • LR(0) item

    • Lookahead token

  • Meaning

    • We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token

  • Example

    • The production L  id yields the following LR(1) items


Closure for lr 1

-closure for LR(1)

  • For every [A → α ● Bβ , c] in S

    • for every production B→δ and every token b in the grammar such that b FIRST(βc)

    • Add [B → ● δ , b] to S


Winter 2012 2013 compiler principles syntax analysis parsing part 3

q3

q0

(S → R ∙ , $)

(S’ → ∙ S , $)

(S → ∙ L = R , $)

(S → ∙ R , $)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , $ )

(L → ∙ id , $ )

(L → ∙ * R , $ )

q11

q9

(S → L = R ∙ , $)

(L → id ∙ , $)

R

q1

S

(S’ → S ∙ , $)

id

R

id

L

q2

q12

(S → L ∙ = R , $)

(R → L ∙ , $)

(R → L ∙ , $)

=

id

*

*

q10

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

L

q6

q4

q5

id

(L → id ∙ , $)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , $)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , $)

(L → * R ∙ , $)


Back to the conflict

Back to the conflict

q2

(S → L ∙ = R , $)

(R → L ∙ , $)

=

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

q6

Is there a conflict now?


Lalr 1

LALR(1)

  • LR(1) tables have huge number of entries

  • Often don’t need such refined observation (and cost)

  • Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts

  • LALR(1) not as powerful as LR(1) in theory but works quite well in practice

    • Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice


Winter 2012 2013 compiler principles syntax analysis parsing part 3

q3

q0

(S → R ∙ , $)

(S’ → ∙ S , $)

(S → ∙ L = R , $)

(S → ∙ R , $)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , $ )

(L → ∙ id , $ )

(L → ∙ * R , $ )

q11

q9

(S → L = R ∙ , $)

(L → id ∙ , $)

R

q1

S

(S’ → S ∙ , $)

id

R

id

L

q2

q12

(S → L ∙ = R , $)

(R → L ∙ , $)

(R → L ∙ , $)

=

id

*

*

q10

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

L

q6

q4

q5

id

(L → id ∙ , $)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , $)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , $)

(L → * R ∙ , $)


Winter 2012 2013 compiler principles syntax analysis parsing part 3

q3

q0

(S → R ∙ , $)

(S’ → ∙ S , $)

(S → ∙ L = R , $)

(S → ∙ R , $)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , $ )

(L → ∙ id , $ )

(L → ∙ * R , $ )

q9

(S → L = R ∙ , $)

R

q1

S

(S’ → S ∙ , $)

R

L

q2

(S → L ∙ = R , $)

(R → L ∙ , $)

=

id

*

*

id

q10

(S → L = ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

q6

q4

q5

id

(L → id ∙ , $)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

(L → * ∙ R , $)

(R → ∙ L , $)

(L → ∙ * R , $)

(L → ∙ id , $)

*

id

L

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , $)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , $)

(L → * R ∙ , $)


Left right recursion

Left/Right- recursion

  • At home: create a simple grammar with left-recursion and one with right-recursion

  • Construct corresponding LR(0) parser

    • Any conflicts?

  • Run on simple input and observe behavior

    • Attempt to generalize observation for long inputs


Automated parser generation via cup

Automated Parser Generation(via CUP)


High level structure

High-level structure

text

Lexerspec

.java

Lexical analyzer

JFlex

javac

Lexer.java

TPL.lex

(Token.java)

tokens

Parserspec

.java

Parser

CUP

javac

sym.javaParser.java

TPL.cup

AST


Expression calculator

Expression calculator

exprexpr + expr

| expr - expr

| expr * expr

| expr / expr

| - expr

| ( expr )

| number

  • Goals of expression calculator parser:

  • Is 2+3+4+5 a valid expression?

  • What is the meaning (value) of this expression?


Syntax analysis with cup

Syntax analysis with CUP

  • CUP – parser generator

  • Generates an LALR(1) Parser

  • Input: spec file

  • Output: a syntax analyzer

tokens

Parserspec

.java

Parser

CUP

javac

AST


Cup spec file

CUP spec file

  • Package and import specifications

  • User code components

  • Symbol (terminal and non-terminal) lists

    • Terminals go to sym.java

    • Types of AST nodes

  • Precedence declarations

  • The grammar

    • Semantic actions to construct AST


Expression calculator 1 st attempt

Symbol typeexplained later

Expression Calculator – 1st Attempt

terminal Integer NUMBER;

terminal PLUS, MINUS, MULT, DIV;

terminal LPAREN, RPAREN;

non terminal Integer expr;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr

| LPAREN expr RPAREN

| NUMBER

;


Ambiguities

+

*

+

+

+

+

*

+

a

a

a

a

b

b

c

c

b

b

c

c

Ambiguities

a * b + c

a + b + c


Expression calculator 2 nd attempt

Increasing

precedence

Expression Calculator – 2nd Attempt

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV;

terminal LPAREN, RPAREN;

terminal UMINUS;

non terminal Integer expr;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;

Contextual precedence


Parsing ambiguous grammars using precedence declarations

Parsing ambiguous grammars using precedence declarations

  • Each terminal assigned with precedence

    • By default all terminals have lowest precedence

    • User can assign his own precedence

    • CUP assigns each production a precedence

      • Precedence of last terminal in production

      • or user-specified contextual precedence

  • On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce

  • In case of equal precedences left/right help resolve conflicts

    • left means reduce

    • right means shift

  • More information on precedence declarations in CUP’s manual


Resolving ambiguity

+

+

+

+

a

a

b

c

b

c

Resolving ambiguity

precedence left PLUS

a + b + c


Resolving ambiguity1

*

+

+

*

a

a

b

c

b

c

Resolving ambiguity

precedence left PLUSprecedence left MULT

a * b + c


Resolving ambiguity2

Resolving ambiguity

MINUS expr %prec UMINUS

-

-

-

-

b

a

b

a

- a - b


Resolving ambiguity3

Resolving ambiguity

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV;

terminal LPAREN, RPAREN;

terminal UMINUS;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;

UMINUS never returnedby scanner(used only to define precedence)

Rule has precedence of UMINUS


More cup directives

More CUP directives

  • precedence nonassoc NEQ

    • Non-associative operators: < > == != etc.

    • 1<2<3 identified as an error (semantic error?)

  • start non-terminal

    • Specifies start non-terminal other than first non-terminal

    • Can change to test parts of grammar

  • Getting internal representation

    • Command line options:

      • -dump_grammar

      • -dump_states

      • -dump_tables

      • -dump


Scanner integration

Generated from tokendeclarations in .cup file

Scanner integration

import java_cup.runtime.*;

%%

%cup

%eofval{

return new Symbol(sym.EOF);

%eofval}

NUMBER=[0-9]+

%%

<YYINITIAL>”+” { return new Symbol(sym.PLUS); }

<YYINITIAL>”-” { return new Symbol(sym.MINUS); }

<YYINITIAL>”*” { return new Symbol(sym.MULT); }

<YYINITIAL>”/” { return new Symbol(sym.DIV); }

<YYINITIAL>”(” { return new Symbol(sym.LPAREN); }

<YYINITIAL>”)” { return new Symbol(sym.RPAREN); }

<YYINITIAL>{NUMBER} {

return new Symbol(sym.NUMBER, new Integer(yytext()));

}

<YYINITIAL>\n { }

<YYINITIAL>. { }

Parser gets terminals from the scanner


Recap

Recap

  • Package and import specifications and user code components

  • Symbol (terminal and non-terminal) lists

    • Define building-blocks of the grammar

  • Precedence declarations

    • May help resolve conflicts

  • The grammar

    • May introduce conflicts that have to be resolved


Assigning meaning

Assigning meaning

  • So far, only validation

  • Add Java code implementing semantic actions

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;


Assigning meaning1

Assigning meaning

  • Symbol labels used to name variables

  • RESULT names the left-hand side symbol

expr ::= expr:e1 PLUS expr:e2

{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}

| expr:e1 MINUS expr:e2

{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}

| expr:e1 MULT expr:e2

{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}

| expr:e1 DIV expr:e2

{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}

| MINUS expr:e1

{: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS

| LPAREN expr:e1 RPAREN

{: RESULT = e1; :}

| NUMBER:n

{: RESULT = n; :}

;


Building an ast

Building an AST

  • More useful representation of syntax tree

    • Less clutter

    • Actual level of detail depends on your design

  • Basis for semantic analysis

  • Later annotated with various information

    • Type information

    • Computed values


Parse tree vs ast

Parse tree vs. AST

expr

+

expr

+

expr

expr

expr

1

+

(

2

)

+

(

3

)

1

2

3


Ast construction

AST construction

  • AST Nodes constructed during parsing

    • Stored in push-down stack

  • Bottom-up parser

    • Grammar rules annotated with actions for AST construction

    • When node is constructed all children available (already constructed)

    • Node (RESULT) pushed on stack

  • Top-down parser

    • More complicated


Ast construction1

int_const

int_const

int_const

val = 3

val = 2

val = 1

plus

plus

e1

e1

e2

e2

AST construction

expr ::= expr:e1 PLUS expr:e2

{: RESULT = new plus(e1,e2); :}

| LPAREN expr:e RPAREN

{: RESULT = e; :}

| INT_CONST:i

{: RESULT = new int_const(…, i); :}

1 + (2) + (3)

expr + (2) + (3)

expr + (expr) + (3)

expr + (3)

expr + (expr)

expr

expr

expr

expr

expr

expr

1

+

(

2

)

+

(

3

)


Designing an ast

Designing an AST

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI;

terminal UMINUS;

non terminal Integer expr;

non terminal expr_list, expr_part;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr_list ::= expr_list expr_part

| expr_part

;

expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI

;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;


See you next time

See you next time


  • Login