1 / 38

CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour

CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour. June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412. Simplified Compiler Structure. Source code. Understand source code. if (b == 0) a = b;. Front end (machine-independent). Intermediate code. Optimize.

dinesh
Download Presentation

CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 381 - Summer 2005Top-down and Bottom-up Parsing - a whirlwindtour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412

  2. Simplified Compiler Structure Source code Understand source code if (b == 0) a = b; Front end (machine-independent) Intermediate code Optimize Optimizer Intermediate code Back end (machine-dependent) Generate assembly code Assembly code cmp $0,ecx cmovz edx,ecx

  3. Simplified Front-End Structure Source code (character stream) if (b == 0) a = b; Lexical Analysis Tokenstream if ( b == 0 ) a = b ; Syntax Analysis (Parsing) if Abstract SyntaxTree (AST) == = b 0 a b Semantic Analysis

  4. Parse Tree vs. AST S E + S E ( S ) 5 E + S + + 1 E S + 5 E 2 1 + ( S ) + 2 E + S E 3 4 3 4 • Parse tree also called “concrete syntax” Abstract Syntax Tree Parse Tree (Concrete Syntax) Discards (abstracts) unneeded information

  5. How to build an AST • Need to find a derivation for the program in the grammar • Want an efficient algorithm • should only read token stream once • exponential brute-force search out of question • even CKY is too slow • Two main ways to parse: • top-down parsing (recursive descent) • bottom-up parsing (shift-reduce)

  6. Parsing Top-down S  E + S | E E  num | ( S ) Goal: construct a leftmost derivation of string while reading in token stream Partly-derived String Lookahead S ( (1+2+(3+4))+5  E+S ( (1+2+(3+4))+5  (S) +S 1 (1+2+(3+4))+5 • (E+S)+S 1 (1+2+(3+4))+5  (1+S)+S 2 (1+2+(3+4))+5  (1+E+S)+S 2 (1+2+(3+4))+5  (1+2+S)+S 2 (1+2+(3+4))+5 • (1+2+E)+S ( (1+2+(3+4))+5 • (1+2+(S))+S 3 (1+2+(3+4))+5 • (1+2+(E+S))+S 3 (1+2+(3+4))+5 parsed partunparsed part

  7. Problem S  E + S | E E  num | ( S ) • Want to decide which production to apply based on next symbol (1) S  E (S) (E) (1) (1)+2 S  E+ S  (S) + S (E) + S  (1)+E  (1)+2 • Why is this hard?

  8. Grammar is Problem • This grammar cannot be parsed top-down with only a single look-ahead symbol • NotLL(1)=Left-to-right-scanning, Left-most derivation, 1 look-ahead symbol • Is it LL(k) for some k? • Can rewrite grammar to allow top-down parsing: create LL(1) grammar for same language

  9. Making a grammar LL(1) S  E + S S  E E  num E  ( S ) • Problem: can’t decide which S production to apply until we see symbol after first expression • Left-factoring: Factor common S prefix, add new non-terminal S' at decision point. S' derives (+E)* S  ES' S'   S'  + S E  num E  ( S )

  10. Parsing with new grammar S ( (1+2+(3+4))+5  E S' ( (1+2+(3+4))+5  (S) S' 1 (1+2+(3+4))+5  (ES') S' 1 (1+2+(3+4))+5  (1 S') S' + (1+2+(3+4))+5  (1+ES') S' 2 (1+2+(3+4))+5  (1+2 S') S' + (1+2+(3+4))+5  (1+2 + S) S' ( (1+2+(3+4))+5  (1+2 + ES') S' ( (1+2+(3+4))+5  (1+2 + (S) S') S' 3 (1+2+(3+4))+5  (1+2 + (ES') S') S' 3 (1+2+(3+4))+5  (1+2 + (3 S') S') S' + (1+2+(3+4))+5  (1+2 + (3 + E) S') S' 4 (1+2+(3+4))+5 S  ES ' S '  | + S E  num | ( S )

  11. Predictive Parsing • LL(1) grammar: • for a given non-terminal, the look-ahead symbol uniquely determines the production to apply • top-down parsing = predictive parsing • Driven by predictive parsing table of non-terminals  terminals  productions

  12. Using Table S  E S ' S '   | + S E  num | ( S ) S ( (1+2+(3+4))+5  E S' ( (1+2+(3+4))+5  (S) S' 1 (1+2+(3+4))+5  (ES') S' 1 (1+2+(3+4))+5  (1 S') S' + (1+2+(3+4))+5  (1 + S) S' 2 (1+2+(3+4))+5  (1+ES') S' 2 (1+2+(3+4))+5  (1+2 S') S' + (1+2+(3+4))+5 num + ( ) $ S E S 'E S ' S '  +S   E  num  ( S )

  13. How to Implement? • Table can be converted easily into a recursive-descent parser num + ( ) $ S E S 'E S ' S '  +S   E  num  ( S ) • Three procedures: parse_S, parse_S’, parse_E

  14. Recursive-Descent Parser void parse_S () { switch (token) { case num: parse_E(); parse_S’(); return; case ‘(’: parse_E(); parse_S’(); return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’  +S   E  number  ( S ) lookahead token

  15. Recursive-Descent Parser void parse_S’() { switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)’: return; case EOF: return; default: throw new ParseError(); } } number + ( )$ S ES’ ES’ S’  +S   E  number  ( S )

  16. Recursive-Descent Parser void parse_E() { switch (token) { case number: token = input.read(); return; case ‘(‘: token = input.read(); parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’  +S   E  number  ( S )

  17. Call Tree = Parse Tree S E S’ ( S ) + S E S’ 5 1 + S E S’ 2 + S E S’  ( S ) E S’ + S 3 E 4 (1 + 2 + (3 + 4)) + 5 parse_S parse_S’ parse_E parse_S parse_S parse_E parse_S’ parse_S parse_S’ parse_E parse_S parse_S’ parse_E parse_S

  18. How to Construct Parsing Tables • There exists an algorithm for automatically generating a predictive parse table from a grammar (take 412 for details) N + ( ) $ S ES’ES’ S’ +S   E N ( S ) S  ES’ S’   | + S E  number | ( S )

  19. Summary for top-down parsing • LL(k) grammars • left-to-right scanning • leftmost derivation • can determine what production to apply from the next k symbols • Can automatically build predictive parsing tables • Predictive parsers • Can be easily built for LL(k) grammars from the parsing tables • Also called recursive-descent, or top-down parsers

  20. Top-Down Parsing Summary Language grammar Left-recursion elimination Left-factoring LL(1) grammar predictive parsing table recursive-descent parser parser with AST generation

  21. Now: Bottom-up Parsing • A more powerful parsing technology • LR grammars -- more expressive than LL • construct right-most derivation of program • virtually all programming languages • easier to express programming language syntax • Shift-reduce parsers • Parsers for LR grammars • automatic parser generators (e.g. yacc,CUP)

  22. Bottom-up Parsing • Right-most derivation -- backward • Start with the tokens • End with the start symbol (1+2+(3+4))+5 (E+2+(3+4))+5  (S+2+(3+4))+5 (S+E+(3+4))+5  (S+(3+4))+5  (S+(E+4))+5 (S+(S+4))+5  (S+(S+E))+5  (S+(S))+5 (S+E)+5  (S)+5  E+5  S+ES S  S + E | E E  num | ( S )

  23. Progress of Bottom-up Parsing (1+2+(3+4))+5 (1+2+(3+4))+5 (E+2+(3+4))+5  (1 +2+(3+4))+5 (S+2+(3+4))+5  (1 +2+(3+4))+5 (S+E+(3+4))+5  (1+2 +(3+4))+5 (S+(3+4))+5  (1+2+(3+4))+5 (S+(E+4))+5  (1+2+(3+4))+5 (S+(S+4))+5  (1+2+(3+4))+5 (S+(S+E))+5  (1+2+(3+4))+5 (S+(S))+5  (1+2+(3+4))+5 (S+E)+5  (1+2+(3+4))+5 (S)+5  (1+2+(3+4) )+5 E+5  (1+2+(3+4))+5 S+E(1+2+(3+4))+5 S(1+2+(3+4))+5 right-most derivation

  24. Bottom-up Parsing • (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 … • Advantage of bottom-up parsing: can postpone the selection of productions until more of the input is scanned S  S + E | E E  num | ( S ) S S + E E 5 ( S ) S + E ( S ) S+E E S + E 2 4 1 E 3

  25. Top-down Parsing (1+2+(3+4))+5 S  S+E  E+E  (S)+E  (S+E)+E  (S+E+E)+E (E+E+E)+E  (1+E+E)+E  (1+2+E)+E ... • In left-most derivation, entire tree above a token (2) has been expanded when encountered S  S + E | E E  num | ( S ) S S + E E 5 ( S ) S + E ( S ) S + E E S + E 2 4 1 E 3

  26. Top-down vs. Bottom-up Bottom-up: Don’t need to figure out as much of the parse tree for a given amount of input scanned unscanned scanned unscanned Top-down Bottom-up

  27. Shift-reduce Parsing • Parsing actions: is a sequence of shift and reduce operations • Parser state: a stack of terminals and non-terminals (grows to the right) • Current derivation step = always stack+input Derivation step stack unconsumed input (1+2+(3+4))+5 (1+2+(3+4))+5 (E+2+(3+4))+5  (E +2+(3+4))+5 (S+2+(3+4))+5  (S +2+(3+4))+5 (S+E+(3+4))+5  (S+E +(3+4))+5

  28. Shift-reduce Parsing • Parsing is a sequence of shifts and reduces • Shift : move look-ahead token to stack stack input action ( 1+2+(3+4))+5 shift 1 (1 +2+(3+4))+5 • Reduce : Replace symbols  from top of stack with non-terminal symbol X, corresponding to production X   (pop , push X) stack input action (S+E +(3+4))+5 reduce S S+E (S +(3+4))+5

  29. Shift-reduce Parsing S  S + E | E E  num | ( S ) (1+2+(3+4))+5 (1+2+(3+4))+5 shift (1+2+(3+4))+5  ( 1+2+(3+4))+5 shift (1+2+(3+4))+5  (1 +2+(3+4))+5 reduce Enum (E+2+(3+4))+5  (E +2+(3+4))+5 reduce S  E (S+2+(3+4))+5  (S +2+(3+4))+5 shift (S+2+(3+4))+5  (S+2+(3+4))+5 shift (S+2+(3+4))+5  (S+2 +(3+4))+5 reduce Enum (S+E+(3+4))+5  (S+E+(3+4))+5 reduce S S+E (S+(3+4))+5  (S+(3+4))+5 shift (S+(3+4))+5  (S+ (3+4))+5 shift (S+(3+4))+5  (S+(3+4))+5 shift (S+(3+4))+5  (S+(3+4))+5 reduce Enum derivation input stream action stack

  30. Problem • How do we know which action to take: whether to shift or reduce, and which production? • Issues: • Sometimes can reduce but shouldn’t • Sometimes can reduce in different ways

  31. Action Selection Problem • Given stack  and look-ahead symbol b, should parser: • shift b onto the stack (making it b) • reduceX   assuming that stack has the form   (making it X) • If stack has form  , should apply reduction X   (or shift) depending on stack prefix  •  is different for different possible reductions, since ’s have different length.

  32. LR Parsing Engine • Basic mechanism: • Use a set of parser states • Use a stack with alternating symbols and states • E.g: 1(6S10+5 • Use a parsing table to: • Determine what action to apply (shift/reduce) • Determine the next state • The parser actions can be precisely determined from the table

  33. The LR Parsing Table Terminals Non-terminals Next action and next state Next state State Goto table Action table • Algorithm: look at entry for current state S and input terminal C • If Table[S,C] = s(S’) thenshift: • push(C), push(S’) • If Table[S,C] = Xa then reduce: • pop(2*|a|), S’=top(), push(X), push(Table[S’,X])

  34. LR Parsing Table Example ( ) id , $ S L 1 s3 s2 g4 2Sid Sid Sid Sid Sid 3 s3 s2 g7 g5 4accept 5 s6 s8 6S(L) S(L) S(L) S(L) S(L) 7LS LS LS LS LS 8s3 s2 g9 9LL,S LL,S LL,S LL,S LL,S

  35. LR(k) Grammars • LR(k) = Left-to-right scanning, Right-most derivation, k look-ahead characters • Main cases: LR(0), LR(1), and some variations (SLR and LALR(1)) • Parsers for LR(0) Grammars: • Determine the actions without any lookahead symbol

  36. Building LR(0) Parsing Tables • To build the parsing table: • Define states of the parser • Build a DFA to describe the transitions between states • Use the DFA to build the parsing table

  37. Summary for bottom-up parsing • LR(k) grammars • left-to-right scanning • rightmost derivation • can determine whether to shift or reduce from the next k symbols • Can automatically build predictive parsing tables • Shift-reduce parsers • Can be built for LR(k) grammars using automated parser generator tools, eg. CUP, yacc.

  38. Top-down vs. Bottom-up again LL(k), recursive descent LR(k), shift-reduce scanned unscanned scanned unscanned Top-down Bottom-up

More Related