Compiler construction in4020 lecture 4
Sponsored Links
This presentation is the property of its rightful owner.
1 / 50

Compiler construction in4020 – lecture 4 PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

Compiler construction in4020 – lecture 4. Koen Langendoen Delft University of Technology The Netherlands. program text. lexical analysis. tokens. parser. language. syntax analysis. generator. grammar. AST. context handling. annotated AST. Summary of lecture 3.

Download Presentation

Compiler construction in4020 – lecture 4

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Compiler constructionin4020 – lecture 4

Koen Langendoen

Delft University of Technology

The Netherlands


program text

lexical analysis

tokens

parser

language

syntax analysis

generator

grammar

AST

context handling

annotated AST

Summary of lecture 3

  • syntax analysis: tokens  AST

  • top-down parsing

    • recursive descent

    • push-down automaton

    • making grammars LL(1)


Quiz

2.31Can you create a top-down parser for the following grammars?

(a) S ‘(‘ S ‘)’ | ‘)’

(b) S ‘(‘ S ‘)’ | 

(c) S ‘(‘ S ‘)’ | ‘)’ | 


program text

lexical analysis

tokens

parser

language

syntax analysis

generator

grammar

AST

context handling

annotated AST

Overview

  • syntax analysis: tokens  AST

  • bottom-up parsing

    • push-down automaton

    • ACTION/GOTO tables

    • LR(0), SLR(1), LR(1), LALR(1)


rest_expression

expression

term

rest_expr

IDENT

IDENT

IDENT

aap + ( noot + mies )

Bottom-up (LR) parsing

  • Left-to-right parse, Rightmost-derivation

  • create a node when all

    children are present

  • handle: nodes representing

    the right-hand side of a

    production


LR(0) parsing

  • running example: expression grammar

    input expression EOF

    expression  expression ‘+’ term | term

    term  IDENTIFIER | ‘(’ expression ‘)’

  • short-hand notation

    Z  E $

    E  E ‘+’ T | T

    T  i | ‘(’ E ‘)


LR(0) parsing

  • running example: expression grammar

    input expression EOF

    expression  expression ‘+’ term | term

    term  IDENTIFIER | ‘(’ expression ‘)’

  • short-hand notation

    Z  E $

    E  E ‘+’ T

    E  T

    T  i

    T  ‘(’ E ‘)’


Z   E $

LR(0) parsing

  • keep track of progress inside potential

    handles when consuming input tokens

  • LR items: N  

  • initial set

  • -closure: expand dots in front of non-terminals

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

S0


input

i + i $

stack

S0

LR(0) parsing

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

  • shift input token (i) onto the stack

  • compute new state


LR(0) parsing

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 i S1

+ i $

  • reduce handle on top of the stack

  • compute new state


LR(0) parsing

  • reduce handle on top of the stack

  • compute new state

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 T S2

+ i $

i


LR(0) parsing

  • shift input token on top of the stack

  • compute new state

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 E S3

+ i $

T

i


LR(0) parsing

  • shift input token on top of the stack

  • compute new state

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 E S3 + S4

i $

T

i


LR(0) parsing

  • reduce handle on top of the stack

  • compute new state

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 E S3 + S4 i S1

$

T

i


LR(0) parsing

  • reduce handle on top of the stack

  • compute new state

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 E S3 + S4 T S5

$

T

i

i


LR(0) parsing

  • shift input token on top of the stack

  • compute new state

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 E S3

$

E

+

T

T

i

i


LR(0) parsing

  • reduce handle on top of the stack

  • compute new state

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 E S3 $ S6

E

+

T

T

i

i


LR(0) parsing

  • accept!

Z  E $

E  E ‘+’ T

E  T

T  i

T  ‘(’ E ‘)’

stack

input

S0 Z

E

$

E

+

T

T

i

i


S0

Z   E $

E   E ‘+’ T

E   T

T   i

T   ‘(’ E ‘)’

S3

Z  E  $

E  E  ‘+’ T

S5

E  E + T 

Transition diagram

S2

T

E  T 

i

S1

T  i 

E

i

S4

E  E ‘+’  T

T   i

T   ‘(’ E ‘)’

‘+’

$

T

S6

Z  E $ 


Exercise (8 min.)

  • complete the transition diagram for the LR(0) automaton

  • can you think of a single input expression that causes all states to be used? If yes, give an example. If no, explain.


Answers


S0

Z   E $

E   E ‘+’ T

E   T

T   i

T   ‘(’ E ‘)’

S3

Z  E  $

E  E  ‘+’ T

S5

E  E + T 

Answers (fig 2.89)

S2

S7

T  ‘(’  E ‘)’

E   E ‘+’ T

E   T

T   i

T   ‘(’ E ‘)’

T

T

E  T 

‘(’

i

i

S1

T  i 

E

‘(’

E

‘(’

i

S4

S8

E  E ‘+’  T

T   i

T   ‘(’ E ‘)’

‘+’

‘+’

T  ‘(‘ E  ‘)’

E  E  ‘+’ T

$

‘)’

T

S6

S9

Z  E $ 

T  ‘(‘ E ‘)’ 


Answers

The following expression exercises all states

( i ) + i


The LR tables


LR(0) parsingconcise notation


The LR push-down automaton

SWITCH action_table[top of stack]:

CASE “shift”:

see book;

CASE (“reduce”, N ):

POP the symbols of  FROM the stack;

SET state TO top of stack;

PUSHNON the stack;

SET new state TO goto_table[state,N];

PUSH new state ON the stack;

CASE empty:

ERROR;


Break


LR(0) conflicts

  • shift-reduce conflict

    • array indexing: T  i [ E ]

      T  i [ E ](shift)

      T  i(reduce)

    • -rule: RestExpr 

      Expr  Term  RestExpr(shift)

      RestExpr (reduce)


LR(0) conflicts

  • reduce-reduce conflict

    • assignment statement: Z  V := E $

      V  i(reduce)

      T  i(reduce)

  • typical LR(0) table contains many conflicts


Handling LR(0) conflicts

  • solution: use a one-token look-ahead

    two-dimensional ACTION table [state,token]

  • different construction of ACTION table

    • SLR(1) – Simple LR

    • LR(1)

    • LALR(1) – Look-Ahead LR


SLR(1) parsing

  • solves (some) shift-reduce conflicts

  • reduce N  iff token FOLLOW(N)

    FOLLOW(T) = { ‘+’, ‘)’, $ }

    FOLLOW(E) = { ‘+’, ‘)’, $ }

    FOLLOW(Z) = { $ }


SLR(1) ACTION table


SLR(1) ACTION/GOTO table

1: Z  E $

2: E  E ‘+’ T

3: E  T

4: T  i

5: T  ‘(’ E ‘)’

sn – shift to

state n

rn – reduce

rule n


SLR(1) ACTION/GOTO table

1: Z  E $

2: E  E ‘+’ T

3: E  T

4: T  i

5: T  ‘(’ E ‘)’

6: T  i ‘[‘ E ‘]’

sn – shift to

state n

rn – reduce

rule n


SLR(1) ACTION/GOTO table

1: Z  E $

2: E  E ‘+’ T

3: E  T

4: T  i

5: T  ‘(’ E ‘)’

6: T  i ‘[‘ E ‘]’

sn – shift to

state n

rn – reduce

rule n


SLR(1) ACTION/GOTO table

1: Z  E $

2: E  E ‘+’ T

3: E  T

4: T  i

5: T  ‘(’ E ‘)’

6: T  i ‘[‘ E ‘]’

sn – shift to

state n

rn – reduce

rule n


Unfortunately ...

  • SLR(1) leaves many shift-reduce conflicts unsolved

  • problem: FOLLOW(N) set is a union of all contexts in which N may occur

  • example

    S  A | x b

    A  a A b | x


LR(0) automaton

S3

S5

S  A 

A  x 

A

x

S0

S   A

S   x b

A   a A b

A   x

S4

A  a  A b

A   a A b

A   x

a

a

x

A

S1

S6

S  x  b

A  x 

1: S  A

2: S  x b

3: A  a A b

4: A  x

A  a A  b

b

b

S2

S7

S  x b 

A  a A b 


Exercise (6 min.)

  • derive the SLR(1) ACTION/GOTO table (with shift-reduce conflict) for the following grammar:

    S  A | x b

    A  a A b | x


Answers


Answers

1: S  A

2: S  x b

3: A  a A b

4: A  x

FOLLOW(S) = {$}

FOLLOW(A) = {$,b}


LR(1) parsing

  • maintain follow set per item

    LR(1) item:N   {}

  •  - closure for LR(1) item sets:

    if set S contains an item P  N  {} then

    foreach production rule N  

    S must contain the item N   {}

    where  = FIRST(  {} )


LR(1) automaton

S3

S5

S  A  {$}

A  x  {b}

A

x

x

S0

S   A {$}

S   x b {$}

A   a A b {$}

A   x {$}

S4

S8

A  a  A b {$}

A   a A b {b}

A   x {b}

A  a  A b {b}

A   a A b {b}

A   x {b}

a

a

x

A

A

A

S1

S6

S9

S  x  b {$}

A  x  {$}

A  a A  b {$}

A  a A  b {b}

b

b

b

S2

S7

S10

S  x b  {$}

A  a A b  {$}

A  a A b  {b}


LALR(1) parsing

  • LR tables are big

  • combine “equal” sets by merging look-ahead sets


LALR(1) automaton

S3

S5

S  A  {$}

A  x  {b}

A

x

x

S0

S   A {$}

S   x b {$}

A   a A b {$}

A   x {$}

S4

S8

A  a  A b {$}

A   a A b {b}

A   x {b}

A  a  A b {b}

A   a A b {b}

A   x {b}

a

a

x

A

A

A

S1

S6

S9

S  x  b {$}

A  x  {$}

A  a A  b {$}

A  a A  b {b}

b

b

b

S2

S7

S10

S  x b  {$}

A  a A b  {$}

A  a A b  {b}


LALR(1) automaton

S3

S5

S  A  {$}

A  x  {b}

A

x

S0

S   A {$}

S   x b {$}

A   a A b {$}

A   x {$}

S   A {$}

S   x b {$}

A   a A b {b,$}

A   x {b,$}

S4

A  a  A b {b,$}

A   a A b {b,$}

A   x {b,$}

A  a  A b {b,$}

A   a A b {b,$}

A   x {b,$}

a

a

x

A

A

S1

S6

S  x  b {$}

A  x  {$}

S  x  b {$}

A  x  {b,$}

A  a A  b {b,$}

A  a A  b {b,$}

b

b

S2

S7

A  a A b  {b,$}

S  x b  {$}

A  a A b  {b,$}


LALR(1) ACTION/GOTO table

1: S  A

2: S  x b

3: A  a A b

4: A  x


+

*

*

E

E

+

E

E

E

E

Making grammars LR(1) – or not?

  • grammars are often ambiguous

    E  E ‘+’ E | E ‘*’ E | i

  • handle shift-reduce conflicts

    • (default) longest match: shift

    • precedence directives

      input: i * i + i

      E  E ‘+’ E

      E  E ‘*’ E 

reduce

shift


Summary

  • syntax analysis: tokens  AST

  • bottom-up parsing

    • push-down automaton

    • ACTION/GOTO tables

      • LR(0)NO look-ahead

      • SLR(1)one-token look-ahead, FOLLOW sets to solve shift-reduce conflicts

      • LR(1)SLR(1), but FOLLOW set per item

      • LALR(1)LR(1), but “equal” states are merged


Homework

  • study sections:

    • 1.10 closure algorithm

    • 2.2.5.8 error handling in LR(1) parsers

  • print handout for next week [blackboard]

  • find a partner for the “practicum”

  • register your group

    • send e-mail to [email protected]


  • Login