1 / 29

Winter 2007-2008 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2)

Winter 2007-2008 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2). Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University. Executable code. exe. IC Language. ic. Today. Today: Review Grammars, parse trees, ambiguity Top-down parsing

mandy
Download Presentation

Winter 2007-2008 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Winter 2007-2008Compiler ConstructionT3 – Syntax Analysis(Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

  2. Executable code exe ICLanguage ic Today • Today: • Review • Grammars, parse trees, ambiguity • Top-down parsing • Bottom-up parsing LexicalAnalysis Syntax Analysis Parsing AST SymbolTableetc. Inter.Rep.(IR) CodeGeneration • Next week: • Conflict resolution • Shift/Reduce parsing via JavaCup • (Error handling) • AST intro. • PA2

  3. Goals of parsing • Programming language has syntactic rules • Context-Free Grammars • Decide whether program satisfies syntactic structure • Error detection • Error recovery • Simplification: rules on tokens • Build Abstract Syntax Tree

  4. E E + E num ( E ) + E * E num id num * 7 x From text to abstract syntax 5 + (7 * x) program text Lexical Analyzer token stream Grammar:E id E num E  E+EE  E*EE  ( E ) Parser parse tree valid syntaxerror Abstract syntax tree

  5. E E + E num ( E ) + E * E num id num * num x From text to abstract syntax Note: a parse tree (עץ גזירה) describes a run of the parser,an abstract syntax tree is the result of a successful run token stream Grammar:E id E num E  E+EE  E*EE  ( E ) Parser parse tree valid syntaxerror Abstract syntax tree

  6. Parsing terminology Symbols סימנים)):terminals (tokens)+ * ( )id numnon-terminals E Grammar rules :(חוקי דקדוק)E id E num E  E+EE  E*EE  ( E ) Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal Parse tree (עץ גזירה): Derivation (גזירה):EE + E1+ E1+ E * E1+2* E 1+2*3 E E + E Each step in a derivation is called a production 1 * E E 3 2

  7. Ambiguity Grammar rules:E id E num E  E+EE  E*EE  ( E ) Definition: a grammar is ambiguous(רב-משמעי) if there exists an input string that has two different derivations Rightmost derivation Leftmost derivation Parse tree: Parse tree: Derivation:EE + E1+ E1+ E * E1+2* E 1+2*3 Derivation:EE * EE *3E + E * 3E +2* 31 + 2* 3 E E E + E E * E 1 3 * + E E E E 3 2 2 1

  8. Grammar rewriting Unambiguous grammar: E  E + T E  T T  T * F T  F F  id F  num F  ( E ) Ambiguous grammar:E  id E  num E  E + EE  E * EE  ( E ) Parse tree: Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3 Sometimes, a grammar can be changed to an unambiguous grammar (while preserving the same language) E E + T T Note the difference between a language and a grammar:A grammar represents a language.A language can be represented by many grammars. * F T F 3 F 1 2

  9. Parsing methods • Top-down / predictive / recursive descent without backtracking : LL(1) • “L” – left-to-right scan of input • “L” – leftmost derivation • “1” – predict based on one look-ahead token • For every non-terminal and token predict the next production • Bottom-up : LR(0), SLR(1), LR(1), LALR(1) • “L” – left-to-right scan of input • “R” – rightmost derivation (in the reversed order) • For every potential right hand side and token decide when a production is found

  10. Top-down parsing • Builds parse tree in preorder • Also called predictive parsing • LL(1) example if 5 then print 8 else… Token : rule Sif:S  if E then S else Sif E then S else S5: E  numif 5 then S else Sprint:print Eif 5 then print E else S … Grammar: S  if E then S else S S  begin S L S  print E L  end L  ; S LE  num

  11. Left recursion: E  E + T Symbol on left also first symbol on right Predictive parsing fails when two rules can start with same tokenE  E + T E  T Rewrite grammar using left-factoring Nullable, FIRST, FOLLOW sets Problem: left recursion Left-factored grammar: E  T Ep F  id Ep + T EpF  ( E ) EpT  F Tp Tp * F TpTp Arithmetic expressions: E  E + T E  T T  T * F T  F F  id F  ( E )

  12. More on left recursion • Non-terminal with two rules starting with same prefix Left-factored grammar: S  if E then S X X  X  else S Grammar: S  if E then S else S S  if E then S

  13. Bottom-up parsing • No problem with left recursion • Widely used in practice • LR(0), SLR(1), LR(1), LALR(1) • We will focus only on the theory of LR(0) • JavaCup implements LALR(1)

  14. Bottom-up parsing 1 + (2) + (3) E  E + (E) E i E + (2) + (3) E + (E) + (3) E + (3) E E + (E) E E E E E 1 + ( 2 ) + 3 ( )

  15. Shift-reduce parsing • Parser stack: symbols (terminal and non-terminals) + automaton states • Parsing actions: sequence of shift and reduce operations • Action determined by top of stack and k input tokens • Shift: move next token to top of stack • Reduce: for rule X  A B C pop C, B, A then push X • Convention: $ stands for end of file

  16. Pushdown automaton input u t w $ (V,5) control parser-table $ stack

  17. LR parsing table non-terminals state terminals 0 rk gm 1 ... gotopart shift/reduceactions sn Shift and move to state n Reduce by rule k Goto state m

  18. Parsing table example S  E$ E  T E  E+ T T i T (E) STATE SYMBOL

  19. Items Items indicate the position inside a rule:LR(0) items are of the form A  t(LR(1) items are of the form A  t, ) Grammar: S  E$ E  T E  E+ T T i T (E)

  20. T 5: E T $ 2: S E $  Automaton states 0 1 1: S E$ 4: E T 6: E E +T 10: T i 12: T  (E) 2: S E $ 7: E E +T 6 E i 5 2 11: T i + ( 7 i 3 13: T (E) 4: E T 6: E E +T 10: T i 12: T  (E) 7: E E +T 10: T i 12: T  (E) ( ( i E T + 8 4 9 14: T (E ) 7: E E +T 8: E E +T  ) 15: T (E) 

  21. T 5: E T $ 2: S E $  Automaton states 0 1 1: S E$ 4: E T 6: E E +T 10: T i 12: T  (E) 2: S E $ 7: E E +T 6 E i 5 2 11: T i + ( 7 non-terminal transition corresponds to goto action in parse table i 3 13: T (E) 4: E T 6: E E +T 10: T i 12: T  (E) 7: E E +T 10: T i 12: T  (E) ( terminal transition corresponds to shift action in parse table ( a single reduce item corresponds to reduce action i E T + 8 4 9 14: T (E ) 7: E E +T 8: E E +T  ) 15: T (E) 

  22. Identifying handles • Create a finite state automaton over grammar symbols • Sets of LR(0) items • Use automaton to build parser tables • shift For itemsA  t on token t • reduce For items A on every token • Any grammar has • Transition diagram • GOTO table • Not every grammar has deterministic action table • When no conflicts occur we can use a DPDA which pushes states on the stack

  23. Non-LR(0) grammars • When conflicts occur the grammar is not LR(0) • Parsing table contains non-determinism • shift-reduce conflicts • reduce-reduce conflicts • shift-shift conflicts? • Known cases • Operator precedence • Operator associativity • Dangling if-then-else • Unary minus • Solutions • Develop equivalent non-ambiguous grammar • Patch parsing table to shift/reduce • Precedence and associativity of tokens • Stronger parser algorithm: SLR/LR(1)/LALR(1)

  24. Precedence E  E+E*E E  E+E Reduce + precedes * Shift * precedes + Precedence and associativity

  25. Precedence E  E+E*E E  E+E Reduce + precedes * Shift * precedes + E E E E E 1 + 2 * 3 E E E E E 1 + 2 * 3 Precedence and associativity = 9 = 7

  26. Associativity E  E+E+E E  E+E Shift + right-associative Reduce + left-associative Precedence E  E+E*E E  E+E Reduce + precedes * Shift * precedes + Precedence and associativity

  27. Dangling else/if-else ambiguity Grammar: S  if E then S else S S  if E then S S  other if a then if b then e1 else e2which interpretation should we use? (1) if a then { if b then e1 else e2 } -- standard interpretation (2) if a then { if b then e1 } else e2 shift/reduce conflict LR(1) items: token: S  if E then S  else S  if E then S  else S (any)

  28. See you next week

  29. Grammar hierarchy Non-ambiguous CFG LALR(1) LL(1) SLR(1) LR(0)

More Related