600 likes | 700 Views
Learn about LR parsing, Baojian Hua's approach, tokenizer, abstract syntax trees, and more in compiler development for valid programs.
 
                
                E N D
LR Parsing Compiler Baojian Hua bjhua@ustc.edu.cn
Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR
Parsing • The parser translates the source program into abstract syntax trees • Token sequence: • returned from the lexer • abstract syntax tree: • check validity of programs • form compiler internal data structures for programs • Must take account the program syntax
Conceptually parser token sequence abstract syntax tree language syntax
Predicative Parsing • Grammars encode enough information on how to choose production rules, when input terminals are seen • LL(1) pros: • simple, easy to implement • efficient • Cons: • grammar rewriting • ugly
Today’s Topic • Bottom-up Parsing • shift-reduce parsing, LR parsing • This is the predominant algorithm used by automatic YACC-like parser generators • YACC, bison, CUP, etc.
Bottom-up Parsing 1 S := exp 2 exp := exp + term 3 exp := term 4 term := term * factor 5 term := factor 6 factor := ID 7 factor := INT 2 + 3 * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S A reverse of right-most derivation!
Dot notation • As a convenient notation, we will mark how much of the input we have consumed by using a • symbol exp + 3 * 4 consumed remaining input
Bottom-up Parsing 2 + 3 * 4 factor+ 3 * 4 term+ 3 * 4 exp + 3 * 4 exp + factor* 4 exp + term * 4  exp + term * factor  exp + term  exp  S  2 + 3 * 4 factor + 3 * 4 term + 3 * 4 exp + 3 * 4 exp + factor * 4 exp + term * 4 exp + term * factor exp + term exp S
Another View 2 factor term exp exp + exp + 3 exp + factor exp + term exp + term * exp + term * 4 exp + term * factor exp + term exp S 2 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 3 * 4 * 4  * 4  * 4  4      S := exp exp := exp + term exp := term term := term * factor term := factor factor := ID factor := INT What’s the data structure of the left?
Producing a rightmost derivation in reverse • We do two things: • shift a token (terminal) onto the stack, or • reduce the top n symbols on the stack by a production • When we reduce by a production A ::=  •  is on the top of the stack, pop  • and push A • Key problem: when to shift or reduce?
Yet Another View 2 factor term exp 2 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 E T F 2
E Yet Another View 2 factor term exp exp + exp + 3 exp + factor exp + term 2 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 3 * 4 * 4  * 4  * 4 S + T E T * F T F 4 F 3 2
A shift-reduce parser • Two components: • Stack: holds the viable prefixes • Input stream: holds remaining source • Four actions: • shift: push token from input stream onto stack • reduce: right-end ( of A := ) is at top of stack, pop , push A • accept: success • error: syntax error discovered
Table-driven LR(k) parsers AST tokens Parser Loop Lexer Stack Action table & GOTO table Grammar Parser Generator
An LR parser • Put S on stack in state s0 • Parser configuration is:(S, s0, X1, s1, X2, s2, … Xm, sm; ai ai+1… an $) • do forever: • read ai. • if (action[ai, sm] is shift s then(S, s0, X1, s1, X2, s2, … Xm, sm, ai, s; ai+1… an $) • if (action[ai, sm] is reduce A:=  then(S, s0, X1, s1, X2, s2, … Xm-| |, sm-| |, A, s; ai ai+1… an $)where s = goto[sm-| |, A] • if (action[ai, sm] is accept, DONE • if (action[ai, sm] is error, handle error
Generating LR parsers • In order to generate an LR parser, we must create the action and GOTO tables • Many different ways to do this • We will start here with the simplest approach, called LR(0) • Left-to-right parsing, Rightmost derivation, 0 lookahead
Item • LR(0) items have the form:[production-with-dot] • For example, X -> A B C has 4 forms of items • [X :=  A B C ] • [X := A  B C ] • [X := A B  C ] • [X := A B C  ]
What items mean? • [X :=    ] • input is consistent with X :=    • [X :=    ] • input is consistent with X :=    and we have already recognized  • [X :=     ] • input is consistent with X :=    and we have already recognized   • [X :=     ] • input is consistent with X :=    and we can reduce to X
1 S’ ->  S $ S ->  S x S ->  y x S 4 S’ -> S  $ LR(0) Items 8 2 x L -> L,  S S ->  (L) S ->  x S -> x  0: S’ -> S$ 1: S -> x S 2: S -> y x 3 ( ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x 9 S , L -> L, S  ( 5 S -> (L  ) L -> L , S L ) S 7 6 L -> S  S -> (L) 
1 S’ ->  S $ S ->  (L) S ->  x x S 4 S’ -> S  $ LR(0) Items 8 2 x L -> L,  S S ->  (L) S ->  x S -> x  0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x 9 S , L -> L, S  ( 5 S -> (L  ) L -> L , S L ) S 7 6 L -> S  S -> (L) 
LR(0) table construction • Construct LR(0) Items • Item Ii becomes state i • Parsing actions at state i are: • [ A :=   a  ]  Ii and goto(Ii, a) = Ijthen action[i, a] = “shift j” • [ A :=   ]  Ii and A  S’then action[i, a] =“reduce by A := ” • [ S’ := S  ]  Iithen action[i, $] =“accept”
LR(0) table construction, cont’d • GOTO table for non-terminals: GOTO[i,A] = j if GOTO(Ii, A) = Ij • Empty entries are “error”
Problems with LR(0) • For every item of the form: X ->   • blindly reduce to X, followed with a “goto” • which may not miss any error, but may postpone the detection of some errors
1 S’ ->  S $ S ->  (L) S ->  x x S 4 S’ -> S  $ Problems with LR(0) 8 2 x L -> L,  S S ->  (L) S ->  x S -> x  0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x 9 S , L -> L, S  ( 5 S -> (L  ) L -> L , S L ) S 7 6 Consider this input: x 5 $ L -> S  S -> (L) 
1 S ->  E $ E ->  T + E E ->  T T ->  x E x 5 T -> x  Another Example 2 S -> E  $ 0: S -> E$ 1: E -> T+E 2: E -> T 3: T -> x 3 T E -> T  +E E -> T  + T 4 x E -> T+  E E ->  T+E E ->  T T ->  x 6 E E -> T+E A shift-reduce conflict!
SLR table construction • Construct LR(0) Items • Item Ii becomes state i • Parsing actions at state i are: • [ A :=   a  ]  Ii and goto(Ii, a) = Ijthen action[i,a] = “shift j” • [ A :=   ]  Ii and A  S’then action[i,a] =“reduce by A := ”for all a  FOLLOW(A) • [ S’ := S  ]  Iithen action[i,$] =“accept” • GOTO table for non-terminals: • GOTO[i,A] = j if GOTO(Ii, A) = Ij • Empty entries are “error”
1 S’ ->  S $ S ->  (L) S ->  x x S 4 S’ -> S  $ Reduce LR(0) Table 8 2 x L -> L,  S S ->  (L) S ->  x S -> x  0: S’ -> S$ 1: S -> (L) 2: S -> x 3: L -> S 4: L -> L, S x 3 ( ( S -> ( L) L ->  S L ->  L, S S ->  (L) S ->  x 9 S , L -> L, S  ( 5 S -> (L  ) L -> L , S L ) S 7 6 Follow set: S’ {$} S {$, ,, )} L {,, )} L -> S  S -> (L) 
1 S ->  E $ E ->  T + E E ->  T T ->  x E x 5 T -> x  Resolve Shift-reduce Conflict 2 S -> E  $ 0: S -> E$ 1: E -> T+E 2: E -> T 3: T -> x 3 T E -> T  +E E -> T  + T 4 x E -> T+  E E ->  T+E E ->  T T ->  x 6 E E -> T+E Follow set: S {$} E {$} T {+, $}
L := *  R R :=  L L :=  *R L :=  id 4 0 S’ :=  S $ S :=  L = R S :=  R L :=  *R L :=  id R :=  L R := L  8 L := id  5 S := L = R  9 S := L =  R R :=  L L :=  *R L :=  id 6 1 S’ := S  $ S := L  = R R := L  2 S := R  3 L := * R  7 Problems with SLR * R S’ := S$ S := L = R | R L := * R | id R := L * id L L L = R
S := L  = R R := L  2 Problems with SLR • Reduce on ALL terminals in FOLLOW set • FOLLOW(R) = FOLLOW(L) • But, we should never reduce R := L on ‘=‘ • Thus, there should be no reduction in state 2 • Why this happen and how can we solve this? S := L = R | R L := * R | id R := L
LR(1) Items • [X :=   , a] Means •  is at top of stack • Input string is derivable from a • In other words, when we reduce X := , a had better be the look ahead symbol. • Or, put ‘reduce by X := ’ in action[s, a] only
LR(1) table construction • Construct LR(1) Items • Item Ii becomes state i • Parsing actions at state i are: • [ A :=   a  ,b]  Ii and goto(Ii, a) = Ijthen action[i, a] = “shift j” • [ A :=   ,b]  Ii and A  S’then action[i, a] =“reduce by A := ” for b • [ S’ := S  ,$]  Iithen action[i, $] =“accept” • GOTO table for non-terminals: GOTO[i, A] = j if GOTO(Ii, A) = Ii • Empty entries are “error” • Initial state is from Item containing [S’ := S ,$]
0 S’ :=  S ,$ S :=  L = R,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ 1 S’ := S  ,$ S := L= R ,$ R := L ,$ 2 LR(1) Items (part) S’ := S$ S := L = R | R L := * R | id R := L L
L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 0 S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ R := L  ,=/$ R := L  ,$ 10 8 L := id  ,$ L := id  ,=/$ 11 5 S := L = R  ,$ 9 S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 1 S’ := S  ,$ S := L  = R ,$ R := L  ,$ 2 S := R  ,$ 3 L := *R  ,=/$ 7 More * R S := L = R | R L := * R | id R := L * id L L R others
L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 0 S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ R := L  ,=/$ R := L  ,$ 10 8 L := id  ,$ L := id  ,=/$ 11 5 S := L = R  ,$ 9 S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 1 S’ := S  ,$ S := L  = R ,$ R := L  ,$ 2 S := R  ,$ 3 L := *R  ,=/$ 7 Notice similar states? * R S := L = R | R L := * R | id R := L * id L L R others
L := *  R ,=/$ R :=  L ,=/$ L :=  *R ,=/$ L :=  id ,=/$ 4 0 S’ :=  S ,$ S :=  L = R ,$ S :=  R ,$ L :=  *R ,=/$ L :=  id ,=/$ R :=  L ,$ S := L = R  ,$ 9 S := L =  R ,$ R :=  L ,$ L :=  *R ,$ L :=  id ,$ 6 1 S’ := S  ,$ S := L  = R ,$ R := L  ,$ 2 S := R  ,$ 3 L := *R  ,=/$ 7 Notice similar states? * R S := L = R | R L := * R | id R := L * id L L L := id  ,=/$ 5 R := L  ,=/$ 8 R := L  ,$ 10 L := id  ,$ 11 R others
3 7 6 5 4 2 8 0 9 1 C := d  ,$ S := C  C ,$ C :=  cC ,$ C :=  d ,$ C := cC , $ C := c  C ,$ C :=  cC ,$ C :=  d ,$ S’ :=  S ,$ S := CC ,$ C :=  cC ,c/d C :=  d ,c/d S’ := S  ,$ C := CC  ,$ C := d  ,c/d C := cC  ,c/d C := c  C ,c/d C :=  cC ,c/d C :=  d ,c/d LALR S := CC C := cC | d c C c d d S c C C C c d
89 47 36 0 7 6 5 9 2 1 C := d , $ S’ := S , $ C := cC , $ S := C  C, $ C :=  cC, $ C :=  d, $ C := c  C, $ C :=  cC, $ C :=  d, $ S’ :=  S, $ S := CC, $ C :=  cC, c/d C :=  d, c/d C := CC , $ C := d , c/d/$ C := cC ,c/d/$ C := c  C, c/d/$ C :=  cC, c/d/$ C :=  d, c/d/$ LALR S := CC C := cC | d c C c d d d c S c C C C
89 47 36 0 2 5 1 C := cC ,c/d/$ C := c  C, c/d/$ C :=  cC, c/d/$ C :=  d, c/d/$ S := C  C, $ C :=  cC, $ C :=  d, $ C := d , c/d/$ S’ :=  S, $ S := CC, $ C :=  cC, c/d C :=  d, c/d C := CC , $ S’ := S , $ LALR S := CC C := cC | d c C c d d d c S C C
LALR Construction • Merge items with common cores • Change GOTO table to reflect merges • Can introduce reduce/reduce conflicts • Cannot introduce shift/reduce conflicts
Ambiguous Grammars • No ambiguous grammars can be LR(k) • hence can not be parsed bottom-up • Nevertheless, some of the ambiguous grammar are well-understood, and can be parsed by LR(k) with some tricks • precedence • associativity • dangling-else
E := E *  E E :=  E * E E :=  E + E E :=  id E := E +  E E :=  E * E E :=  E + E E :=  id E := E + E  E := E  * E E := E  + E E := E * E  E := E  * E E := E  + E Precedence E := E*E | E+E | id S’ := E  $ E := E  * E E := E  + E S’ :=  E $ E :=  E * E E :=  E + E E :=  id E s/r on both * and +
E := E *  E E :=  E * E E :=  E + E E :=  id E := E +  E E :=  E * E E :=  E + E E :=  id E := E + E  E := E  * E E := E  + E E := E * E  E := E  * E E := E  + E Precedence E := E*E | E+E | id S’ := E  $ E := E  * E E := E  + E S’ :=  E $ E :=  E * E E :=  E + E E :=  id E What if we want both + and * right-associative? reduce on + reduce on * reduce on + shift on *
Parser Implementation • Implementation Options: • Write a parser from scratch • not as boring as writing a lexer, but not exactly simple as you may imagine • Use an automatic parser generator • Very general & robust. sometimes not quite as efficient as hand-written parsers. • Nevertheless, good for lazy compiler writers. • Both are used extensively in production compilers