1 / 27

Compiler Construction: Syntax Analysis (Part 1)

Learn about syntax analysis and parsing in compiler construction. Topics include top-down parsing, bottom-up parsing, lexical analysis, abstract syntax trees, symbol tables, and more.

nohemis
Download Presentation

Compiler Construction: Syntax Analysis (Part 1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Winter 2006-2007Compiler ConstructionT3 – Syntax Analysis(Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

  2. Executable code exe ICLanguage ic Today • Today: • Review • Grammars, parse trees, ambiguity • Top-down parsing • Bottom-up parsing LexicalAnalysis Syntax Analysis Parsing AST SymbolTableetc. Inter.Rep.(IR) CodeGeneration • Next week: • Conflict resolution • Shift/Reduce parsing via JavaCup • (Error handling) • AST intro. • PA2

  3. Goals of parsing • Programming language has syntactic rules • Context-Free Grammars • Decide whether program satisfies syntactic structure • Error detection • Error recovery • Simplification: rules on tokens • Build Abstract Syntax Tree

  4. E E + E num ( E ) + E * E num id num * 7 x From text to abstract syntax 5 + (7 * x) program text Lexical Analyzer token stream Grammar:E id E num E  E+EE  E*EE  ( E ) Parser parse tree valid syntaxerror Abstract syntax tree

  5. Terminology Symbols:terminals (tokens)+ * ( )id numnon-terminals E Grammar rules:E id E num E  E+EE  E*EE  ( E ) Parse tree: Derivation:EE + E1+ E1+ E + E1+2+ E 1+2*3 E E + E 1 * E E 3 2

  6. Ambiguity Grammar rules:E id E num E  E+EE  E*EE  ( E ) Rightmost derivation Leftmost derivation Parse tree: Parse tree: Derivation:EE + E1+ E1+ E + E1+2+ E 1+2*3 Derivation:EE * EE *3E + E * 3E +2* 31 + 2* 3 E E E + E E * E 1 3 * + E E E E 3 2 2 1

  7. Grammar rewriting Non-ambiguous grammar: E  E + T E  T T  T * F T  F F  id F  ( E ) Ambiguous grammar:E  id E  num E  E + EE  E * EE  ( E ) Parse tree: Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3 E E + T T * F T F 3 F 1 2

  8. Parsing methods • Top-down / predictive / recursive descent without backtracking : LL(1) • “L” – left-to-right scan of input • “L” – leftmost derivation • “1” – predict based on one token look-ahead • For every non-terminal and token predict the next production • Bottom-up : LR(0), SLR(1), LR(1), LALR(1) • “L” – left-to-right scan of input • “R” – rightmost derivation (in the reversed order) • For every potential right hand side and token decide when a production is found

  9. Top-down parsing • Builds parse tree in preorder • LL(1) example if 5 then print 8 else… Token : ruleif:S  if E then S else Sif E then S else S5: E  numif 5 then S else Sprint:print Eif 5 then print E else S … Grammar: S  if E then S else S S  begin S L S  print E L  end L  ; S LE  num

  10. Left recursion: E  E + T Symbol on left also first symbol on right Predictive parsing fails when two rules can start with same tokenE  E + T E  T Rewrite grammar using left-factoring Nullable, FIRST, FOLLOW sets Problem: left recursion Left factored grammar: E  T Ep F  id Ep + T EpF  ( E ) EpT  F Tp Tp * F TpTp Arithmetic expressions: E  E + T E  T T  T * F T  F F  id F  ( E )

  11. More left recursion • Non-terminal with two rules starting with same prefix Left factored grammar: S  if E then S X X  X  else S Grammar: S  if E then S else S S  if E then S

  12. Bottom-up parsing • No problem with left recursion • Widely used in practice • LR(0), SLR(1), LR(1), LALR(1) • JavaCup implements LALR(1)

  13. Bottom-up parsing 1 + (2) + (3) E  E + (E) E i E + (2) + (3) E + (E) + (3) E + (3) E E + (E) E E E E E 1 + ( 2 ) + 3 ( )

  14. Shift-reduce parsing • Parser stack: symbols (terminal and non-terminals) + automaton states • Parsing actions: sequence of shift and reduce operations • Action determined by top of stack and k input tokens • Shift: move next token to top of stack • Reduce: for rule X  A B C pop C, B, A then push X • Convention: $ stands for end of file

  15. Pushdown automaton input u t w $ V control parser-table $ stack

  16. LR parsing table non-terminals state terminals 0 rk gm 1 ... gotopart shift/reduceactions sn Shift and move to state n Reduce by rule k Goto state m

  17. Parsing table example S  E$ E  T E  E+ T T i T (E) STATE SYMBOL

  18. Items Items indicate the position inside a rule:LR(0) items are of the form A  t(LR(1) items are of the form A  t, ) Grammar: S  E$ E  T E  E+ T T i T (E)

  19. T 5: E T $ 2: S E $  ( 13: T (E) 4: E T 6: E E +T 10: T i 12: T  (E) E 14: T (E ) 7: E E +T Automaton states 0 1 1: S E$ 4: E T 6: E E +T 10: T i 12: T  (E) 2: S E $ 7: E E +T 6 E i 5 2 11: T i + 7 i 3 7: E E +T 10: T i 12: T  (E) ( T + 8 4 9 8: E E +T  ) 15: T (E) 

  20. Identifying handles • Create a finite state automaton over grammar symbols • Sets of LR(0) items • Use automaton to build parser tables • shift For itemsA  t on token t • reduce For items A on every token • Any grammar has • Transition diagram • GOTO table • Not every grammar has deterministic action table • When no conflicts occur use a DPDA which pushes states on the stack

  21. Non-LR(0) grammars • When conflicts occur the grammar is not LR(0) • Parsing table contains non-determinism • shift-reduce conflicts • reduce-reduce conflicts • shift-shift conflicts? • Known cases • Operator precedence • Operator associativity • Dangling if-then-else • Unary minus • Solutions • Develop equivalent non-ambiguous grammar • Patch parsing table to shift/reduce • Precedence and associativity of tokens • Stronger parser algorithm: SLR/LR(1)/LALR(1)

  22. Precedence E  E+E*E E  E+E Reduce + precedes * Shift * precedes + Precedence and associativity

  23. Precedence E  E+E*E E  E+E Reduce + precedes * Shift * precedes + E E E E E 1 + 2 * 3 E E E E E 1 + 2 * 3 Precedence and associativity = 9 = 7

  24. Associativity E  E+E+E E  E+E Shift + right-associative Reduce + left-associative Precedence E  E+E*E E  E+E Reduce + precedes * Shift * precedes + Precedence and associativity

  25. Dangling else/if-else ambiguity Grammar: S  if E then S else S S  if E then S S  other if a then if b then e1 else e2which interpretation should we use? (1) if a then { if b then e1 else e2 } -- standard interpretation (2) if a then { if b then e1 } else e2 shift/reduce conflict LR(1) items: token: S  if E then S  else S  if E then S  else S (any)

  26. See you next week

  27. Grammar hierarchy Non-ambiguous CFG LALR(1) LL(1) SLR(1) LR(0)

More Related