1 / 50

Compiler construction in4020 – lecture 4

Compiler construction in4020 – lecture 4. Koen Langendoen Delft University of Technology The Netherlands. program text. lexical analysis. tokens. parser. language. syntax analysis. generator. grammar. AST. context handling. annotated AST. Summary of lecture 3.

vince
Download Presentation

Compiler construction in4020 – lecture 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler constructionin4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands

  2. program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Summary of lecture 3 • syntax analysis: tokens  AST • top-down parsing • recursive descent • push-down automaton • making grammars LL(1)

  3. Quiz 2.31Can you create a top-down parser for the following grammars? (a) S ‘(‘ S ‘)’ | ‘)’ (b) S ‘(‘ S ‘)’ |  (c) S ‘(‘ S ‘)’ | ‘)’ | 

  4. program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Overview • syntax analysis: tokens  AST • bottom-up parsing • push-down automaton • ACTION/GOTO tables • LR(0), SLR(1), LR(1), LALR(1)

  5. rest_expression expression term rest_expr IDENT IDENT IDENT  aap + ( noot + mies ) Bottom-up (LR) parsing • Left-to-right parse, Rightmost-derivation • create a node when all children are present • handle: nodes representing the right-hand side of a production

  6. LR(0) parsing • running example: expression grammar input expression EOF expression  expression ‘+’ term | term term  IDENTIFIER | ‘(’ expression ‘)’ • short-hand notation Z  E $ E  E ‘+’ T | T T  i | ‘(’ E ‘)

  7. LR(0) parsing • running example: expression grammar input expression EOF expression  expression ‘+’ term | term term  IDENTIFIER | ‘(’ expression ‘)’ • short-hand notation Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’

  8. Z   E $ LR(0) parsing • keep track of progress inside potential handles when consuming input tokens • LR items: N   • initial set • -closure: expand dots in front of non-terminals Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ S0

  9. input i + i $ stack S0 LR(0) parsing Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ • shift input token (i) onto the stack • compute new state

  10. LR(0) parsing Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 i S1 + i $ • reduce handle on top of the stack • compute new state

  11. LR(0) parsing • reduce handle on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 T S2 + i $ i

  12. LR(0) parsing • shift input token on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + i $ T i

  13. LR(0) parsing • shift input token on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 i $ T i

  14. LR(0) parsing • reduce handle on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 i S1 $ T i

  15. LR(0) parsing • reduce handle on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 + S4 T S5 $ T i i

  16. LR(0) parsing • shift input token on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 $ E + T T i i

  17. LR(0) parsing • reduce handle on top of the stack • compute new state Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 E S3 $ S6 E + T T i i

  18. LR(0) parsing • accept! Z  E $ E  E ‘+’ T E  T T  i T  ‘(’ E ‘)’ stack input S0 Z E $ E + T T i i

  19. S0 Z   E $ E   E ‘+’ T E   T T   i T   ‘(’ E ‘)’ S3 Z  E  $ E  E  ‘+’ T S5 E  E + T  Transition diagram S2 T E  T  i S1 T  i  E i S4 E  E ‘+’  T T   i T   ‘(’ E ‘)’ ‘+’ $ T S6 Z  E $ 

  20. Exercise (8 min.) • complete the transition diagram for the LR(0) automaton • can you think of a single input expression that causes all states to be used? If yes, give an example. If no, explain.

  21. Answers

  22. S0 Z   E $ E   E ‘+’ T E   T T   i T   ‘(’ E ‘)’ S3 Z  E  $ E  E  ‘+’ T S5 E  E + T  Answers (fig 2.89) S2 S7 T  ‘(’  E ‘)’ E   E ‘+’ T E   T T   i T   ‘(’ E ‘)’ T T E  T  ‘(’ i i S1 T  i  E ‘(’ E ‘(’ i S4 S8 E  E ‘+’  T T   i T   ‘(’ E ‘)’ ‘+’ ‘+’ T  ‘(‘ E  ‘)’ E  E  ‘+’ T $ ‘)’ T S6 S9 Z  E $  T  ‘(‘ E ‘)’ 

  23. Answers The following expression exercises all states ( i ) + i

  24. The LR tables

  25. LR(0) parsingconcise notation

  26. The LR push-down automaton SWITCH action_table[top of stack]: CASE “shift”: see book; CASE (“reduce”, N ): POP the symbols of  FROM the stack; SET state TO top of stack; PUSHNON the stack; SET new state TO goto_table[state,N]; PUSH new state ON the stack; CASE empty: ERROR;

  27. Break

  28. LR(0) conflicts • shift-reduce conflict • array indexing: T  i [ E ] T  i [ E ](shift) T  i(reduce) • -rule: RestExpr  Expr  Term  RestExpr (shift) RestExpr  (reduce)

  29. LR(0) conflicts • reduce-reduce conflict • assignment statement: Z  V := E $ V  i (reduce) T  i (reduce) • typical LR(0) table contains many conflicts

  30. Handling LR(0) conflicts • solution: use a one-token look-ahead two-dimensional ACTION table [state,token] • different construction of ACTION table • SLR(1) – Simple LR • LR(1) • LALR(1) – Look-Ahead LR

  31. SLR(1) parsing • solves (some) shift-reduce conflicts • reduce N  iff token FOLLOW(N) FOLLOW(T) = { ‘+’, ‘)’, $ } FOLLOW(E) = { ‘+’, ‘)’, $ } FOLLOW(Z) = { $ }

  32. SLR(1) ACTION table

  33. SLR(1) ACTION/GOTO table 1: Z  E $ 2: E  E ‘+’ T 3: E  T 4: T  i 5: T  ‘(’ E ‘)’ sn – shift to state n rn – reduce rule n

  34. SLR(1) ACTION/GOTO table 1: Z  E $ 2: E  E ‘+’ T 3: E  T 4: T  i 5: T  ‘(’ E ‘)’ 6: T  i ‘[‘ E ‘]’ sn – shift to state n rn – reduce rule n

  35. SLR(1) ACTION/GOTO table 1: Z  E $ 2: E  E ‘+’ T 3: E  T 4: T  i 5: T  ‘(’ E ‘)’ 6: T  i ‘[‘ E ‘]’ sn – shift to state n rn – reduce rule n

  36. SLR(1) ACTION/GOTO table 1: Z  E $ 2: E  E ‘+’ T 3: E  T 4: T  i 5: T  ‘(’ E ‘)’ 6: T  i ‘[‘ E ‘]’ sn – shift to state n rn – reduce rule n

  37. Unfortunately ... • SLR(1) leaves many shift-reduce conflicts unsolved • problem: FOLLOW(N) set is a union of all contexts in which N may occur • example S  A | x b A  a A b | x

  38. LR(0) automaton S3 S5 S  A  A  x  A x S0 S   A S   x b A   a A b A   x S4 A  a  A b A   a A b A   x a a x A S1 S6 S  x  b A  x  1: S  A 2: S  x b 3: A  a A b 4: A  x A  a A  b b b S2 S7 S  x b  A  a A b 

  39. Exercise (6 min.) • derive the SLR(1) ACTION/GOTO table (with shift-reduce conflict) for the following grammar: S  A | x b A  a A b | x

  40. Answers

  41. Answers 1: S  A 2: S  x b 3: A  a A b 4: A  x FOLLOW(S) = {$} FOLLOW(A) = {$,b}

  42. LR(1) parsing • maintain follow set per item LR(1) item: N   {} •  - closure for LR(1) item sets: if set S contains an item P  N  {} then foreach production rule N   S must contain the item N   {} where  = FIRST(  {} )

  43. LR(1) automaton S3 S5 S  A  {$} A  x  {b} A x x S0 S   A {$} S   x b {$} A   a A b {$} A   x {$} S4 S8 A  a  A b {$} A   a A b {b} A   x {b} A  a  A b {b} A   a A b {b} A   x {b} a a x A A A S1 S6 S9 S  x  b {$} A  x  {$} A  a A  b {$} A  a A  b {b} b b b S2 S7 S10 S  x b  {$} A  a A b  {$} A  a A b  {b}

  44. LALR(1) parsing • LR tables are big • combine “equal” sets by merging look-ahead sets

  45. LALR(1) automaton S3 S5 S  A  {$} A  x  {b} A x x S0 S   A {$} S   x b {$} A   a A b {$} A   x {$} S4 S8 A  a  A b {$} A   a A b {b} A   x {b} A  a  A b {b} A   a A b {b} A   x {b} a a x A A A S1 S6 S9 S  x  b {$} A  x  {$} A  a A  b {$} A  a A  b {b} b b b S2 S7 S10 S  x b  {$} A  a A b  {$} A  a A b  {b}

  46. LALR(1) automaton S3 S5 S  A  {$} A  x  {b} A x S0 S   A {$} S   x b {$} A   a A b {$} A   x {$} S   A {$} S   x b {$} A   a A b {b,$} A   x {b,$} S4 A  a  A b {b,$} A   a A b {b,$} A   x {b,$} A  a  A b {b,$} A   a A b {b,$} A   x {b,$} a a x A A S1 S6 S  x  b {$} A  x  {$} S  x  b {$} A  x  {b,$} A  a A  b {b,$} A  a A  b {b,$} b b S2 S7 A  a A b  {b,$} S  x b  {$} A  a A b  {b,$}

  47. LALR(1) ACTION/GOTO table 1: S  A 2: S  x b 3: A  a A b 4: A  x

  48. + * * E E + E E E E Making grammars LR(1) – or not? • grammars are often ambiguous E  E ‘+’ E | E ‘*’ E | i • handle shift-reduce conflicts • (default) longest match: shift • precedence directives input: i * i + i E  E ‘+’ E E  E ‘*’ E  reduce shift

  49. Summary • syntax analysis: tokens  AST • bottom-up parsing • push-down automaton • ACTION/GOTO tables • LR(0) NO look-ahead • SLR(1) one-token look-ahead, FOLLOW sets to solve shift-reduce conflicts • LR(1) SLR(1), but FOLLOW set per item • LALR(1) LR(1), but “equal” states are merged

  50. Homework • study sections: • 1.10 closure algorithm • 2.2.5.8 error handling in LR(1) parsers • print handout for next week [blackboard] • find a partner for the “practicum” • register your group • send e-mail to koen@pds.twi.tudelft.nl

More Related