# Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3 - PowerPoint PPT Presentation

1 / 81

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Shift-reduce parsing How does a shift-reduce parser work? How do we construct a shift-reduce parse table? LR(1), SLR(1), LALR(1)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 3

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

#### Presentation Transcript

Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 3

Mayer Goldberg and Roman Manevich

Ben-Gurion University

### Today

• Shift-reduce parsing

• How does a shift-reduce parser work?

• How do we construct a shift-reduce parse table?

• LR(1), SLR(1), LALR(1)

• Meaning of shift-reduce / reduce-reduce conflicts

• Behavior on left-recursion/right-recursion

• Automatic parser generation (CUP)

• Handling ambiguity

### Model of an LR parser

Input

Stack

LRParsing program

state

Output

symbol

### LR parser stack

Sequence made of state, symbol pairs

For instance a possible stack for the grammarS  E \$E  TE  E+ TT idT  (E)could be: 0 T 2 + 7 id 5

### Form of LR parsing table

non-terminals

state

terminals

0

rk

gm

1

acc

...

gotopart

shift/reduceactions

sn

error

shift state n

reduce by rule k

gotostate m

accept

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

### Shift move

Input

Stack

LRParsing program

Output

If action[q, a] = sn

### Result of shift

Input

Stack

LRParsing program

Output

If action[q, a] = sn

### Reduce move

Input

Stack

LRParsing program

Output

2*|β|

• If action[qn, a] = rk

• Production: (k) A β

• If β= σ1… σnTop of stack looks like q1 σ1…qnσn

• goto[q, A] = qm

### Result of reduce move

Input

Stack

LRParsing program

Output

2*|β|

• If action[qn, a] = rk

• Production: (k) A β

• If β= σ1… σnTop of stack looks like q1 σ1…qnσn

• goto[q, A] = qm

### Accept move

Input

Stack

LRParsing program

Output

If action[q, a] = accept

parsing completed

### Error move

Input

Stack

LRParsing program

Output

• If action[q, a] = error

• parsing discovered a syntactic error

### Parsing id+id\$

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

Initialize with state 0

### Parsing id+id\$

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

Initialize with state 0

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

pop id 5

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

push T 6

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

### Constructing an LR parsing table

Construct a (determinized) transition diagram from LR items

If there are conflicts – stop

Fill table entries from diagram

### Form of LR(0) items

To be matched

Input

N  αβ

Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

N  αβ

Shift Item

N  αβ

Reduce Item

### LR(0) items enumeration example

LR(0) items

Grammar

(1) S E \$

(2) E  T

(3) E  E+ T

(4) T id

(5) T  (E)

All items can be obtained by placing a dot at every position for every production:

### Operations for transition diagram construction

Initial = {S’S\$}

For an item set IClosure(I) = Closure(I) + {Xµ is in grammar| NαXβ in I}

Goto(I, X) = { NαXβ | NαXβ in I}

### Initial example

Grammar

(1) S E \$

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E \$}

### Closure example

Grammar

(1) S E \$

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E \$}

Closure({S E \$}) =S E \$E TE E + TT id T  ( E )

### Goto example

Grammar

(1) S E \$

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

Initial = {S E \$}

Closure({S E \$}) =S E \$E TE E + TT id T  ( E )

Goto({S E \$ , E E + T, T id}, E) = {S  E \$, E  E + T}

### Constructing the transition diagram

• Start with state 0 containing itemClosure({S E \$})

• Repeat until no new states are discovered

• For every state p containing item set Ip, and symbol N, compute state q containing item setIq = Closure(goto(Ip, N))

(1) S E \$

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

q7

T

q0

T

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E\$

E  T

E  E + T

T  i

T  (E)

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

S  E\$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

+

+

\$

)

q9

q2

S  E\$

T  (E) 

T

q4

E  E + T

(1) S E \$

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q0

S  E\$

Initialize

(1) S E \$

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q0

S  E\$

E  T

E  E + T

T  i

T  (E)

applyClosure

(1) S E \$

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

T

q0

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E\$

E  T

E  E + T

T  i

T  (E)

(

q5

i

T  i

E

q1

S  E\$

E  E+ T

### Automaton construction example

(1) S E \$

(2) E  T

(3) E  E + T

(4) T id

(5) T  ( E )

q6

E  T

q7

T

q0

T

T  (E)

E  T

E  E + T

T  i

T  (E)

S  E\$

E  T

E  E + T

T  i

T  (E)

non-terminal transition corresponds to goto action in parse table

(

q5

i

i

T  i

E

E

(

(

i

q1

q8

q3

Z  E\$

E  E+ T

T  (E)

E  E+T

E  E+T

T  i

T  (E)

terminal transition corresponds to shift action in parse table

+

+

\$

)

q9

q2

S  E\$

T  (E) 

T

q4

E  E + T

a single reduce item corresponds to reduce action

### Conflicts

Can construct a diagram for every grammar but some may introduce conflicts

shift-reduce conflict: an item set contains at least one shift item and one reduce item

reduce-reduce conflict: an item set contains two reduce items

### LR(0) conflicts

T

q0

S  E\$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

T  i[E]

E

Shift/reduce conflict

S  E \$

E  T

E  E + T

T  i

T  ( E )

T  i[E]

### LR(0) conflicts

T

q0

S  E\$

E  T

E  E + T

T  i

T  (E)

T  i[E]

(

q5

i

T  i

V  i

E

reduce/reduce conflict

S  E \$

E  T

E  E + T

T  i

V  iT  ( E )

### LR(0) conflicts

• Any grammar with an -rule cannot be LR(0)

• Inherent shift/reduce conflict

• A – reduce item

• PαAβ – shift item

• A can always be predicted from P αAβ

### LR variants

• LR(0) – what we’ve seen so far

• SLR(0)

• Removes infeasible reduce actions via FOLLOW set reasoning

• LR(1)

• LR(0) with one lookahead token in items

• LALR(0)

• LR(1) with merging of states with same LR(0) component

### SRL parsing

• A handle should not be reduced to a non-terminal N if the lookahead is a token that cannot follow N

• A reduce item N  α is applicable only when the lookahead is in FOLLOW(N)

• If b is not in FOLLOW(N) we just proved there is no derivation S =>* βNαband thus it is safe to remove the reduce item from the conflicted state

• Differs from LR(0) only on the ACTION table

• Now a row in the parsing table may contain both shift actions and reduce actions and we need to consult the current token to decide which one to take

### SLR action table

Lookahead token from the input

vs.

SLR – use 1 token look-ahead

LR(0) – no look-ahead

… as before…

T  i

T  i[E]

### Going beyond SLR(0)

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

Some common language constructs introduce conflicts even for SLR

q3

R

S → R 

q0

S’ → S

S → L = R

S → R

L → * R

L →  id

R → L

S

q1

S’ → S

q9

S → L = R 

q2

L

S → L = R

R → L 

=

R

q6

S → L = R

R →  L

L → * R

L → id

q5

id

*

L → id 

id

*

id

q4

L → * R

R → L

L → * R

L → id

*

L

q8

R → L 

L

q7

L → * R 

R

### shift/reduce conflict

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

• S → L  = R vs. R → L 

• FOLLOW(R) contains =

• S ⇒ L = R ⇒ * R = R

• SLR cannot resolve conflict

### Inputs requiring shift/reduce

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

q2

S → L = R

R → L 

=

q6

S → L = R

R →  L

L → * R

L → id

For the input id the rightmost derivationS’ => S => R => L => id requires reducing in q2

For the input id = idS’ => S => L = R => L = L => L = id => id = idrequires shifting

### LR(1) grammars

• In SLR: a reduce item N  α is applicable only when the lookahead is in FOLLOW(N)

• But FOLLOW(N) merges lookahead for all alternatives for N

• Insensitive to the context of a given production

• LR(1) keeps lookahead with each LR item

• Idea: a more refined notion of follows computed per item

### LR(1) items

LR(1) items

[L → ● id, *]

[L → ● id, =]

[L → ● id, id]

[L → ● id, \$]

[L → id ●, *]

[L → id ●, =]

[L → id ●, id]

[L → id ●, \$]

(0) S’ → S

(1) S → L = R

(2) S → R

(3) L → * R

(4) L → id

(5) R → L

LR(0) items

[L → ● id]

[L → id ●]

• LR(1) item is a pair

• LR(0) item

• Meaning

• We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token

• Example

• The production L  id yields the following LR(1) items

### -closure for LR(1)

• For every [A → α ● Bβ , c] in S

• for every production B→δ and every token b in the grammar such that b FIRST(βc)

• Add [B → ● δ , b] to S

q3

q0

(S → R ∙ , \$)

(S’ → ∙ S , \$)

(S → ∙ L = R , \$)

(S → ∙ R , \$)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , \$ )

(L → ∙ id , \$ )

(L → ∙ * R , \$ )

q11

q9

(S → L = R ∙ , \$)

(L → id ∙ , \$)

R

q1

S

(S’ → S ∙ , \$)

id

R

id

L

q2

q12

(S → L ∙ = R , \$)

(R → L ∙ , \$)

(R → L ∙ , \$)

=

id

*

*

q10

(S → L = ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

L

q6

q4

q5

id

(L → id ∙ , \$)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , \$)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , \$)

(L → * R ∙ , \$)

### Back to the conflict

q2

(S → L ∙ = R , \$)

(R → L ∙ , \$)

=

(S → L = ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

q6

Is there a conflict now?

### LALR(1)

• LR(1) tables have huge number of entries

• Often don’t need such refined observation (and cost)

• Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts

• LALR(1) not as powerful as LR(1) in theory but works quite well in practice

• Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice

q3

q0

(S → R ∙ , \$)

(S’ → ∙ S , \$)

(S → ∙ L = R , \$)

(S → ∙ R , \$)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , \$ )

(L → ∙ id , \$ )

(L → ∙ * R , \$ )

q11

q9

(S → L = R ∙ , \$)

(L → id ∙ , \$)

R

q1

S

(S’ → S ∙ , \$)

id

R

id

L

q2

q12

(S → L ∙ = R , \$)

(R → L ∙ , \$)

(R → L ∙ , \$)

=

id

*

*

q10

(S → L = ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

L

q6

q4

q5

id

(L → id ∙ , \$)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

*

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , \$)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , \$)

(L → * R ∙ , \$)

q3

q0

(S → R ∙ , \$)

(S’ → ∙ S , \$)

(S → ∙ L = R , \$)

(S → ∙ R , \$)

(L → ∙ * R , = )

(L → ∙ id , = )

(R → ∙ L , \$ )

(L → ∙ id , \$ )

(L → ∙ * R , \$ )

q9

(S → L = R ∙ , \$)

R

q1

S

(S’ → S ∙ , \$)

R

L

q2

(S → L ∙ = R , \$)

(R → L ∙ , \$)

=

id

*

*

id

q10

(S → L = ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

q6

q4

q5

id

(L → id ∙ , \$)

(L → id ∙ , =)

(L → * ∙ R , =)

(R → ∙ L , =)

(L → ∙ * R , =)

(L → ∙ id , =)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

(L → * ∙ R , \$)

(R → ∙ L , \$)

(L → ∙ * R , \$)

(L → ∙ id , \$)

*

id

L

L

q8

R

q7

(R → L ∙ , =)

(R → L ∙ , \$)

R

q13

(L → * R ∙ , =)

(L → * R ∙ , \$)

(L → * R ∙ , \$)

### Left/Right- recursion

• At home: create a simple grammar with left-recursion and one with right-recursion

• Construct corresponding LR(0) parser

• Any conflicts?

• Run on simple input and observe behavior

• Attempt to generalize observation for long inputs

### High-level structure

text

Lexerspec

.java

Lexical analyzer

JFlex

javac

Lexer.java

TPL.lex

(Token.java)

tokens

Parserspec

.java

Parser

CUP

javac

sym.javaParser.java

TPL.cup

AST

### Expression calculator

exprexpr + expr

| expr - expr

| expr * expr

| expr / expr

| - expr

| ( expr )

| number

• Goals of expression calculator parser:

• Is 2+3+4+5 a valid expression?

• What is the meaning (value) of this expression?

### Syntax analysis with CUP

• CUP – parser generator

• Generates an LALR(1) Parser

• Input: spec file

• Output: a syntax analyzer

tokens

Parserspec

.java

Parser

CUP

javac

AST

### CUP spec file

• Package and import specifications

• User code components

• Symbol (terminal and non-terminal) lists

• Terminals go to sym.java

• Types of AST nodes

• Precedence declarations

• The grammar

• Semantic actions to construct AST

Symbol typeexplained later

### Expression Calculator – 1st Attempt

terminal Integer NUMBER;

terminal PLUS, MINUS, MULT, DIV;

terminal LPAREN, RPAREN;

non terminal Integer expr;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr

| LPAREN expr RPAREN

| NUMBER

;

+

*

+

+

+

+

*

+

a

a

a

a

b

b

c

c

b

b

c

c

a * b + c

a + b + c

Increasing

precedence

### Expression Calculator – 2nd Attempt

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV;

terminal LPAREN, RPAREN;

terminal UMINUS;

non terminal Integer expr;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;

Contextual precedence

### Parsing ambiguous grammars using precedence declarations

• Each terminal assigned with precedence

• By default all terminals have lowest precedence

• User can assign his own precedence

• CUP assigns each production a precedence

• Precedence of last terminal in production

• or user-specified contextual precedence

• On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce

• In case of equal precedences left/right help resolve conflicts

• left means reduce

• right means shift

• More information on precedence declarations in CUP’s manual

+

+

+

+

a

a

b

c

b

c

### Resolving ambiguity

precedence left PLUS

a + b + c

*

+

+

*

a

a

b

c

b

c

### Resolving ambiguity

precedence left PLUSprecedence left MULT

a * b + c

### Resolving ambiguity

MINUS expr %prec UMINUS

-

-

-

-

b

a

b

a

- a - b

### Resolving ambiguity

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV;

terminal LPAREN, RPAREN;

terminal UMINUS;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;

UMINUS never returnedby scanner(used only to define precedence)

Rule has precedence of UMINUS

### More CUP directives

• precedence nonassoc NEQ

• Non-associative operators: < > == != etc.

• 1<2<3 identified as an error (semantic error?)

• start non-terminal

• Specifies start non-terminal other than first non-terminal

• Can change to test parts of grammar

• Getting internal representation

• Command line options:

• -dump_grammar

• -dump_states

• -dump_tables

• -dump

Generated from tokendeclarations in .cup file

### Scanner integration

import java_cup.runtime.*;

%%

%cup

%eofval{

return new Symbol(sym.EOF);

%eofval}

NUMBER=[0-9]+

%%

<YYINITIAL>”+” { return new Symbol(sym.PLUS); }

<YYINITIAL>”-” { return new Symbol(sym.MINUS); }

<YYINITIAL>”*” { return new Symbol(sym.MULT); }

<YYINITIAL>”/” { return new Symbol(sym.DIV); }

<YYINITIAL>”(” { return new Symbol(sym.LPAREN); }

<YYINITIAL>”)” { return new Symbol(sym.RPAREN); }

<YYINITIAL>{NUMBER} {

return new Symbol(sym.NUMBER, new Integer(yytext()));

}

<YYINITIAL>\n { }

<YYINITIAL>. { }

Parser gets terminals from the scanner

### Recap

• Package and import specifications and user code components

• Symbol (terminal and non-terminal) lists

• Define building-blocks of the grammar

• Precedence declarations

• May help resolve conflicts

• The grammar

• May introduce conflicts that have to be resolved

### Assigning meaning

• So far, only validation

• Add Java code implementing semantic actions

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;

### Assigning meaning

• Symbol labels used to name variables

• RESULT names the left-hand side symbol

expr ::= expr:e1 PLUS expr:e2

{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}

| expr:e1 MINUS expr:e2

{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}

| expr:e1 MULT expr:e2

{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}

| expr:e1 DIV expr:e2

{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}

| MINUS expr:e1

{: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS

| LPAREN expr:e1 RPAREN

{: RESULT = e1; :}

| NUMBER:n

{: RESULT = n; :}

;

### Building an AST

• More useful representation of syntax tree

• Less clutter

• Actual level of detail depends on your design

• Basis for semantic analysis

• Later annotated with various information

• Type information

• Computed values

expr

+

expr

+

expr

expr

expr

1

+

(

2

)

+

(

3

)

1

2

3

### AST construction

• AST Nodes constructed during parsing

• Stored in push-down stack

• Bottom-up parser

• Grammar rules annotated with actions for AST construction

• When node is constructed all children available (already constructed)

• Node (RESULT) pushed on stack

• Top-down parser

• More complicated

int_const

int_const

int_const

val = 3

val = 2

val = 1

plus

plus

e1

e1

e2

e2

### AST construction

expr ::= expr:e1 PLUS expr:e2

{: RESULT = new plus(e1,e2); :}

| LPAREN expr:e RPAREN

{: RESULT = e; :}

| INT_CONST:i

{: RESULT = new int_const(…, i); :}

1 + (2) + (3)

expr + (2) + (3)

expr + (expr) + (3)

expr + (3)

expr + (expr)

expr

expr

expr

expr

expr

expr

1

+

(

2

)

+

(

3

)

### Designing an AST

terminal Integer NUMBER;

terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI;

terminal UMINUS;

non terminal Integer expr;

non terminal expr_list, expr_part;

precedence left PLUS, MINUS;

precedence left DIV, MULT;

precedence left UMINUS;

expr_list ::= expr_list expr_part

| expr_part

;

expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI

;

expr ::= expr PLUS expr

| expr MINUS expr

| expr MULT expr

| expr DIV expr

| MINUS expr %prec UMINUS

| LPAREN expr RPAREN

| NUMBER

;