1 / 56

LR Parsing

LR Parsing. Compiler Baojian Hua bjhua@ustc.edu.cn. Front End. lexical analyzer. source code. tokens. abstract syntax tree. parser. semantic analyzer. IR. Parsing. The parser translates the source program into abstract syntax trees Token sequence: returned from the lexer

acravens
Download Presentation

LR Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LR Parsing Compiler Baojian Hua bjhua@ustc.edu.cn

  2. Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR

  3. Parsing • The parser translates the source program into abstract syntax trees • Token sequence: • returned from the lexer • abstract syntax tree: • check validity of programs • form compiler internal data structures for programs • Must take account the program syntax

  4. Conceptually parser token sequence abstract syntax tree language syntax

  5. Predicative Parsing • Grammars encode enough information on how to choose production rules, when input terminals are seen • LL(1) pros: • simple, easy to implement • efficient • Cons: • grammar rewriting • ugly

  6. Today’s Topic • Bottom-up Parsing • shift-reduce parsing, LR parsing • This is the predominant algorithm used by automatic YACC-like parser generators • YACC, bison, CUP, etc.

  7. Bottom-up Parsing 1 S := exp 2 exp := exp + term 3 exp := term 4 term := term * factor 5 term := factor 6 factor := ID 7 factor := INT 2 + 3 * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S A reverse of right-most derivation!

  8. Dot notation • As a convenient notation, we will mark how much of the input we have consumed by using a • symbol exp + 3 * 4 consumed remaining input

  9. Bottom-up Parsing 2 + 3 * 4 factor+ 3 * 4 term+ 3 * 4 exp + 3 * 4 exp + factor* 4 exp + term * 4  exp + term * factor  exp + term  exp  S  2 + 3 * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S

  10. Another View 2 factor term exp exp + exp + 3 exp + factor exp + term exp + term * exp + term * 4 exp + term * factor exp + term exp S 2 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 3 * 4 * 4  * 4  * 4  4      S := exp exp := exp + term exp := term term := term * factor term := factor factor := ID factor := INT What’s the data structure of the left?

  11. Producing a rightmost derivation in reverse • We do two things: • shift a token (terminal) onto the stack, or • reduce the top n symbols on the stack by a production • When we reduce by a production A ::=  •  is on the top of the stack, pop  • and push A • Key problem: when to shift or reduce?

  12. Yet Another View 2 factor term exp 2 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 E T F 2

  13. E Yet Another View 2 factor term exp exp + exp + 3 exp + factor exp + term 2 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 3 * 4 * 4  * 4  * 4 S + T E T * F T F 4 F 3 2

  14. A shift-reduce parser • Two components: • Stack: holds the viable prefixes • Input stream: holds remaining source • Four actions: • shift: push token from input stream onto stack • reduce: right-end ( of A := ) is at top of stack, pop , push A • accept: success • error: syntax error discovered

  15. Table-driven LR(k) parsers AST tokens Parser Loop Lexer Stack Action table & GOTO table Grammar Parser Generator

  16. An LR parser • Put S on stack in state s0 • Parser configuration is:(S, s0, X1, s1, X2, s2, … Xm, sm; ai ai+1… an $) • do forever: • read ai. • if (action[ai, sm] is shift s then(S, s0, X1, s1, X2, s2, … Xm, sm, ai, s; ai+1… an $) • if (action[ai, sm] is reduce A:=  then(S, s0, X1, s1, X2, s2, … Xm-| |, sm-| |, A, s; ai ai+1… an $)where s = goto[sm-| |, A] • if (action[ai, sm] is accept, DONE • if (action[ai, sm] is error, handle error

  17. Generating LR parsers • In order to generate an LR parser, we must create the action and GOTO tables • Many different ways to do this • We will start here with the simplest approach, called LR(0) • Left-to-right parsing, Rightmost derivation, 0 lookahead

  18. Item • LR(0) items have the form:[production-with-dot] • For example, X -> A B C has 4 forms of items • [X :=  A B C ] • [X := A  B C ] • [X := A B  C ] • [X := A B C  ]

  19. What items mean? • [X :=    ] • input is consistent with X :=    • [X :=    ] • input is consistent with X :=    and we have already recognized  • [X :=     ] • input is consistent with X :=    and we have already recognized   • [X :=     ] • input is consistent with X :=    and we can reduce to X

  20. 1 S’ ->  S $ S ->  S x S ->  y x S 4 S’ -> S  $ LR(0) Items 8 2 x L -> L,  S S ->  (L) S ->  x S -> x  0: S’ -> S$ 1: S -> x S 2: S -> y x 3 ( ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x 9 S , L -> L, S  ( 5 S -> (L  ) L -> L , S L ) S 7 6 L -> S  S -> (L) 

  21. 1 S’ ->  S $ S ->  (L) S ->  x x S 4 S’ -> S  $ LR(0) Items 8 2 x L -> L,  S S ->  (L) S ->  x S -> x  0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x 9 S , L -> L, S  ( 5 S -> (L  ) L -> L , S L ) S 7 6 L -> S  S -> (L) 

  22. LR(0) table construction • Construct LR(0) Items • Item Ii becomes state i • Parsing actions at state i are: • [ A :=   a  ]  Ii and goto(Ii, a) = Ijthen action[i, a] = “shift j” • [ A :=   ]  Ii and A  S’then action[i, a] =“reduce by A := ” • [ S’ := S  ]  Iithen action[i, $] =“accept”

  23. LR(0) table construction, cont’d • GOTO table for non-terminals: GOTO[i,A] = j if GOTO(Ii, A) = Ij • Empty entries are “error”

  24. LR(0) Table

  25. Problems with LR(0) • For every item of the form: X ->   • blindly reduce to X, followed with a “goto” • which may not miss any error, but may postpone the detection of some errors

  26. 1 S’ ->  S $ S ->  (L) S ->  x x S 4 S’ -> S  $ Problems with LR(0) 8 2 x L -> L,  S S ->  (L) S ->  x S -> x  0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x 9 S , L -> L, S  ( 5 S -> (L  ) L -> L , S L ) S 7 6 Consider this input: x 5 $ L -> S  S -> (L) 

  27. Problems with LR(0)

  28. 1 S ->  E $ E ->  T + E E ->  T T ->  x E x 5 T -> x  Another Example 2 S -> E  $ 0: S -> E$ 1: E -> T+E 2: E -> T 3: T -> x 3 T E -> T  +E E -> T  + T 4 x E -> T+  E E ->  T+E E ->  T T ->  x 6 E E -> T+E A shift-reduce conflict!

  29. LR(0) Parse Table

  30. SLR table construction • Construct LR(0) Items • Item Ii becomes state i • Parsing actions at state i are: • [ A :=   a  ]  Ii and goto(Ii, a) = Ijthen action[i,a] = “shift j” • [ A :=   ]  Ii and A  S’then action[i,a] =“reduce by A := ”for all a  FOLLOW(A) • [ S’ := S  ]  Iithen action[i,$] =“accept” • GOTO table for non-terminals: • GOTO[i,A] = j if GOTO(Ii, A) = Ij • Empty entries are “error”

  31. 1 S’ ->  S $ S ->  (L) S ->  x x S 4 S’ -> S  $ Reduce LR(0) Table 8 2 x L -> L,  S S ->  (L) S ->  x S -> x  0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x 9 S , L -> L, S  ( 5 S -> (L  ) L -> L , S L ) S 7 6 Follow set: S’ {$} S {$, ,, )} L {,, )} L -> S  S -> (L) 

  32. Reduce LR(0) Table

  33. 1 S ->  E $ E ->  T + E E ->  T T ->  x E x 5 T -> x  Resolve Shift-reduce Conflict 2 S -> E  $ 0: S -> E$ 1: E -> T+E 2: E -> T 3: T -> x 3 T E -> T  +E E -> T  + T 4 x E -> T+  E E ->  T+E E ->  T T ->  x 6 E E -> T+E Follow set: S {$} E {$} T {+, $}

  34. Resolve Shift-reduce Conflict

  35. L := *  R R :=  L L :=  *R L :=  id 4 0 S’ :=  S $ S :=  L = R S :=  R L :=  *R L :=  id R :=  L R := L  8 L := id  5 S := L = R  9 S := L =  R R :=  L L :=  *R L :=  id 6 1 S’ := S  $ S := L  = R R := L  2 S := R  3 L := * R  7 Problems with SLR * R S’ := S$ S := L = R | R L := * R | id R := L * id L L L = R

  36. S := L  = R R := L  2 Problems with SLR • Reduce on ALL terminals in FOLLOW set • FOLLOW(R) = FOLLOW(L) • But, we should never reduce R := L on ‘=‘ • Thus, there should be no reduction in state 2 • Why this happen and how can we solve this? S := L = R | R L := * R | id R := L

  37. LR(1) Items • [X :=   , a] Means •  is at top of stack • Input string is derivable from a • In other words, when we reduce X := , a had better be the look ahead symbol. • Or, put ‘reduce by X := ’ in action[s, a] only

  38. LR(1) table construction • Construct LR(1) Items • Item Ii becomes state i • Parsing actions at state i are: • [ A :=   a  ,b]  Ii and goto(Ii, a) = Ijthen action[i, a] = “shift j” • [ A :=   ,b]  Ii and A  S’then action[i, a] =“reduce by A := ” for b • [ S’ := S  ,$]  Iithen action[i, $] =“accept” • GOTO table for non-terminals: GOTO[i, A] = j if GOTO(Ii, A) = Ii • Empty entries are “error” • Initial state is from Item containing [S’ := S ,$]

  39. 0 S’ :=  S ,$ S :=  L = R,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ 1 S’ := S  ,$ S := L= R ,$ R := L ,$ 2 LR(1) Items (part) S’ := S$ S := L = R | R L := * R | id R := L L

  40. L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 0 S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ R := L  ,=/$ R := L  ,$ 10 8 L := id  ,$ L := id  ,=/$ 11 5 S := L = R  ,$ 9 S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 1 S’ := S  ,$ S := L  = R ,$ R := L  ,$ 2 S := R  ,$ 3 L := *R  ,=/$ 7 More * R S := L = R | R L := * R | id R := L * id L L R others

  41. L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 0 S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ R := L  ,=/$ R := L  ,$ 10 8 L := id  ,$ L := id  ,=/$ 11 5 S := L = R  ,$ 9 S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 1 S’ := S  ,$ S := L  = R ,$ R := L  ,$ 2 S := R  ,$ 3 L := *R  ,=/$ 7 Notice similar states? * R S := L = R | R L := * R | id R := L * id L L R others

  42. L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 0 S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ S := L = R  ,$ 9 S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 1 S’ := S  ,$ S := L  = R ,$ R := L  ,$ 2 S := R  ,$ 3 L := *R  ,=/$ 7 Notice similar states? * R S := L = R | R L := * R | id R := L * id L L L := id  ,=/$ 5 R := L  ,=/$ 8 R := L  ,$ 10 L := id  ,$ 11 R others

  43. 3 7 6 5 4 2 8 0 9 1 C := d  ,$ S := C  C ,$ C :=  cC ,$ C :=  d ,$ C := cC , $ C := c  C ,$ C :=  cC ,$ C :=  d ,$ S’ :=  S ,$ S := CC ,$ C :=  cC ,c/d C :=  d ,c/d S’ := S  ,$ C := CC  ,$ C := d  ,c/d C := cC  ,c/d C := c  C ,c/d C :=  cC ,c/d C :=  d ,c/d LALR S := CC C := cC | d c C c d d S c C C C c d

  44. 89 47 36 0 7 6 5 9 2 1 C := d , $ S’ := S , $ C := cC , $ S := C  C, $ C :=  cC, $ C :=  d, $ C := c  C, $ C :=  cC, $ C :=  d, $ S’ :=  S, $ S := CC, $ C :=  cC, c/d C :=  d, c/d C := CC , $ C := d , c/d/$ C := cC ,c/d/$ C := c  C, c/d/$ C :=  cC, c/d/$ C :=  d, c/d/$ LALR S := CC C := cC | d c C c d d d c S c C C C

  45. 89 47 36 0 2 5 1 C := cC ,c/d/$ C := c  C, c/d/$ C :=  cC, c/d/$ C :=  d, c/d/$ S := C  C, $ C :=  cC, $ C :=  d, $ C := d , c/d/$ S’ :=  S, $ S := CC, $ C :=  cC, c/d C :=  d, c/d C := CC , $ S’ := S , $ LALR S := CC C := cC | d c C c d d d c S C C

  46. LALR Construction • Merge items with common cores • Change GOTO table to reflect merges • Can introduce reduce/reduce conflicts • Cannot introduce shift/reduce conflicts

  47. Ambiguous Grammars • No ambiguous grammars can be LR(k) • hence can not be parsed bottom-up • Nevertheless, some of the ambiguous grammar are well-understood, and can be parsed by LR(k) with some tricks • precedence • associativity • dangling-else

  48. E := E *  E E :=  E * E E :=  E + E E :=  id E := E +  E E :=  E * E E :=  E + E E :=  id E := E + E  E := E  * E E := E  + E E := E * E  E := E  * E E := E  + E Precedence E := E*E | E+E | id S’ := E  $ E := E  * E E := E  + E S’ :=  E $ E :=  E * E E :=  E + E E :=  id E s/r on both * and +

  49. E := E *  E E :=  E * E E :=  E + E E :=  id E := E +  E E :=  E * E E :=  E + E E :=  id E := E + E  E := E  * E E := E  + E E := E * E  E := E  * E E := E  + E Precedence E := E*E | E+E | id S’ := E  $ E := E  * E E := E  + E S’ :=  E $ E :=  E * E E :=  E + E E :=  id E What if we want both + and * right-associative? reduce on + reduce on * reduce on + shift on *

  50. Parser Implementation • Implementation Options: • Write a parser from scratch • not as boring as writing a lexer, but not exactly simple as you may imagine • Use an automatic parser generator • Very general & robust. sometimes not quite as efficient as hand-written parsers. • Nevertheless, good for lazy compiler writers. • Both are used extensively in production compilers

More Related