1 / 34

CS412/413

CS412/413. Introduction to Compilers and Translators Spring ’99 Lecture 5: Bottom-up parsing. Outline. Creating LL(1) grammars Limitations of LL(1) grammars Bottom-up parsing LR(0) parser construction. Administration. Should have received mail about group assignments by now

varen
Download Presentation

CS412/413

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 5: Bottom-up parsing

  2. Outline • Creating LL(1) grammars • Limitations of LL(1) grammars • Bottom-up parsing • LR(0) parser construction CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  3. Administration • Should have received mail about group assignments by now • Homework 1 due next class (Friday) • Monday considered 2 days late (-20%), Tuesday 3 days (-40%) • No class next Monday (Feb 8) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  4. Programming Assignment • Due Monday, Feb 15 • Implement a lexer for Iota language • Do not need to implement DFA construction • Opportunity to work as group • We expect high quality CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  5. Review • Can construct recursive descent parsers for LL(1) grammars Language grammar How to perform this step? LL(1) grammar predictive parse table recursive-descent parser recursive-descent parser w/ AST generation CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  6. Grammars • Have been using grammar for language of “sums with parentheses” • Original grammar: S S + E| E E number | ( S ) • LL(1) grammar for same language: S  ES’ S’   | + S E number | ( S ) (1+(3+4))+5 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  7. (1 + 2 + (3 + 4)) + 5 Left-recursive vs Right-recursive • Original grammar was left-recursive SS+ E S E • LL(1) grammar is right-recursive : parsed top-down S  E S’ S’   | + S • Left-recursive grammars don’t work with top-down parsing -- need an arbitrary amount of look-ahead S  E + S S  E S S + E (...) + (...) + (...) + (...) ... S + E S + E CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  8. How to create an LL(1) grammar • Write a right-recursive grammar S  E + S S  E • Left-factor common prefixes, place suffix in new non-terminal S  E S’ S’   S’  + S CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  9. (1 + 2 + (3 + 4)) + 5 Right Recursion S • Right recursion : right-associative E S’ + ( S ) + S + 5 E S’ 5 1 + 1 + S 2 + E S’ 3 4 2 + S E S’ • Left recursion : left-associative +  ( S ) + 5 E S’ + + + S 3 E 1 2 3 4 4 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  10. Associativity • We can provide left-associativity by massaging the recursive-descent code void parse_S() { switch (token) { case ‘(’: case number: parse_E(); parse_S’(); return; default: throw new ParseError(); } } void parse_S’() { switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)‘: return; case EOF: return; default: throw new ParseError(); } } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  11. Associativity void parse_S() {// parses a sequence of E + E + E ... switch (token) { case ‘(’: case number: parse_E(); switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)‘: return; case EOF: return; default: throw new ParseError(); } return; default: throw new ParseError(); } } tail recursion CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  12. + 3 4 Flattening Associative Operators void parse_S () { // parses an arbitrary sequence of E + E + E ... while (true) { switch (token) { case ‘(’: case number: parse_E (); switch (token) { case ‘+’: token = input.read(); break; case ‘)‘: case EOF: return; default: throw new ParseError(); } break; default: throw new ParseError(); } } } (1 + 2 + (3+4)) + 5 + + 5 1 2 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  13. Summary • Now have complete recipe for building a parser Language grammar LL(1) grammar predictive parse table recursive-descent parser recursive-descent parser w/ AST generation CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  14. Bottom-up parsing • A more powerful parsing technology • LR grammars -- more power than LL • can handle left-recursive grammars, virtually all programming languages • More natural expression of programming language syntax • Shift-reduce parsers • automatic parser generators (e.g. yacc) • detect errors as soon as possible • allows better error recovery CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  15. S S + E| E E number | ( S ) Top-down parsing (1+2+(3+4))+5 S S+E E+E  (S)+E  (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E(1+2+E)+E ... • In left-most derivation, entire tree above a token (2) has been expanded when encountered • Must be able to predict! S S + E E 5 ( S ) S + E ( S ) S + E E S + E 2 4 1 E 3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  16. S S + E| E E number | ( S ) Bottom-up parsing • Right-most derivation-- backward • Start with the tokens • End with the start symbol (1+2+(3+4))+5 (E+2+(3+4))+5  (S+2+(3+4))+5 (S+E+(3+4))+5  (S+(3+4))+5  (S+(E+4))+5 (S+(S+4))+5 (S+(S+E))+5  (S+(S))+5 (S+E)+5  (S)+5  E+5  S+E S CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  17. S S + E| E E number | ( S ) Bottom-up parsing (1+2+(3+4))+5 (1+2+(3+4))+5 (E+2+(3+4))+5  (1 +2+(3+4))+5 (S+2+(3+4))+5  (1 +2+(3+4))+5 (S+E+(3+4))+5  (1+2 +(3+4))+5 (S+(3+4))+5  (1+2+(3 +4))+5 (S+(E+4))+5  (1+2+(3 +4))+5 (S+(S+4))+5  (1+2+(3 +4))+5 (S+(S+E))+5  (1+2+(3+4 ))+5 (S+(S))+5  (1+2+(3+4 ))+5 (S+E)+5  (1+2+(3+4) )+5 (S)+5  (1+2+(3+4) )+5 E+5  (1+2+(3+4)) +5 S+E(1+2+(3+4))+5 S(1+2+(3+4))+5 right-most derivation CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  18. S S + E| E E number | ( S ) Bottom-up parsing • (1+2+(3+4))+5 (E+2+(3+4))+5  (S+2+(3+4))+5 (S+E+(3+4))+5 … • Advantage of bottom-up parsing: can select productions based on more information S S + E E 5 ( S ) S + E ( S ) S+E E S + E 2 4 1 E 3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  19. Top-down vs. Bottom-up Bottom-up: Don’t need to figure out as much of the parse tree for a given amount of input scanned unscanned scanned unscanned Top-down Bottom-up CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  20. Shift-reduce parsing • Parsing is a sequence of shift and reduce operations • Parser state is a stack of terminals and non-terminals (grows to the right) • Unconsumed input is a string of terminals • Current derivation step is always stack+input • Shift -- push head of input onto stack stack input ( 1+2+(3+4))+5 (1 +2+(3+4))+5 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  21. Reduce • Replace symbols  in top of stack with non-terminal symbol X, corresponding to production X   (pop , push X) stack input (S+E +(3+4))+5 reduce S S+E (S +(3+4))+5 • What effect does this have on derivation? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  22. S S + E| E E number | ( S ) Shift-reduce parsing derivation input stream action stack (1+2+(3+4))+5 (1+2+(3+4))+5 shift (1+2+(3+4))+5  ( 1+2+(3+4))+5 shift (1+2+(3+4))+5  (1 +2+(3+4))+5 reduceEnum (E+2+(3+4))+5  (E +2+(3+4))+5 reduceS  E (S+2+(3+4))+5  (S +2+(3+4))+5 shift (S+2+(3+4))+5  (S+ 2+(3+4))+5 shift (S+2+(3+4))+5  (S+2 +(3+4))+5 reduceEnum (S+E+(3+4))+5  (S+E +(3+4))+5 reduce S S+E (S+(3+4))+5  (S +(3+4))+5 shift (S+(3+4))+5  (S+ (3+4))+5 shift (S+(3+4))+5  (S+( 3+4))+5 shift (S+(3+4))+5  (S+(3 +4))+5 reduce Enum CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  23. Problem • How do we know which action to take -- whether to shift or reduce, and which production? • Sometimes can reduce but shouldn’t • e.g., X   can always be reduced • Sometimes can reduce in different ways CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  24. Action Selection Problem • Given stack  and input symbol b, should we • shift b onto the stack (making it b) • reduce some production X   assuming that stack has the form   (making it X) • Should apply reduction X   depending on what stack prefix  is -- but  is different for different possible reductions, since ’s have different length. How to keep track? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  25. Parser States • Idea: summarize all possible stack prefixes  as a parser state • A state transition function updates the parser state as shifts and reductions are performed: DFA • Summarizing discards information • affects what grammars parser handles • affects size of DFA (number of states) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  26. LR(0) parser • Left-to-right scanning, Right-most derivation, zero look-ahead characters • Too weak to handle most language grammars (including this one) • But will help us understand how to build better parsers CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  27. LR(0) states • A state is a set of items • An LR(0) item is a production from the language with a separator “.” somewhere in the RHS of the production • Stuff before “.” already on stack (beginnings of possible ’s) • Stuff after “.” : what we might see next • The prefixes  represented by state E number . E  ( .S ) state item CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  28. An LR(0) grammar: non-empty lists S  ( L ) S  id L  S L  L , S x (x,y) (x, (y,z), w) ((((x)))) (x, (y, (z, w))) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  29. S  ( L ) | id L  S | L, S Closure S ’  . S $ S  . ( L ) S  . id start state Closure S ’  .S $ • Closure of a state adds items for all productions whose LHS occurs in an item in the state, just after “.” • Added items have the “.”located at the beginning • Like NFA  DFA conversion CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  30. Applying shift actions S  ( .L ) L  .S L  .L , S S  . ( L ) S  . id S  ( L ) | id L  S | L , S S ’  . S $ S  . ( L ) S  . id ( ( id id S  id . In new state, include all items that have appropriate input symbol just after dot, and advance dot in those items. (and take closure) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  31. Applying reduce actions S  ( .L ) L  .S L  .L , S S  . ( L ) S  . id S  ( L.) L  L . , S • Need to set state after reducing • On reduction, pop back to old state and take DFA transition on non-terminal reduced L S ’  . S $ S  . ( L ) S  . id ( S ( L  S. id id S  id . states causing reductions CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  32. Full DFA (Appel p. 63) 8 9 2 L  L , . S S  . ( L ) S  . id 1 id S id S ’  . S $ S  . ( L ) S  . id S  id . L  L , S . id 3 S  ( .L ) L  .S L  .L , S S  . ( L ) S  . id ( 5 L S  ( L.) L  L . , S S ) ( S 6 S  ( L ). 4 7 L  S. S ’  S . $ $ final state CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  33. Idea: stack is labeled w/state Let’s try parsing ((x),y) derivation stack input action ((x),y)  1 ((x),y) shift, goto 3 ((x),y)  1 (3 (x),y)shift, goto 3 ((x),y)  1 (3 (3 x),y)shift, goto 2 ((x),y)  1 (3 (3 x2 ),y)reduce Sid ((S),y)  1 (3 (3 S7 ),y)reduce LS ((L),y)  1 (3 (3 L5 ),y)shift, goto 6 ((L),y)  1 (3 (3 L5)6 ,y)reduce S(L) (S,y)  1 (3 S7 ,y)reduce LS (L,y)  1 (3 L5 ,y)shift, goto 8 (L,y)  1 (3 L5 , 8 y)shift, goto 9 (L,y)  1 (3 L5 , 8 y2 )reduce Sid (L,S)  1 (3 L5 , 8 S9 )reduce LL , S (L)  1 (3 L5 )shift, goto 6 (L)  1 (3 L5 )6 reduce S(L) S 1S4$ done S  ( L ) | id L  S | L, S

  34. Summary • Grammars can be parsed bottom-up using a DFA + stack • State construction converts grammar into states that capture information needed to know what action to take • Stack entries labeled by state index • Next time: SLR, LR(1) parsers, automatic parser generators CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

More Related