1 / 32

Parsing

Parsing. Administration. Groups Forum https://forums.cs.tau.ac.il/viewforum.php?f=76. x86 executable. exe. IC Program. ic. IC compiler. Compiler. Lexical Analysis. Syntax Analysis Parsing. AST. Symbol Table etc. Inter. Rep. (IR). Code Generation. Parsing. Input:

leanna
Download Presentation

Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing

  2. Administration • Groups • Forum • https://forums.cs.tau.ac.il/viewforum.php?f=76

  3. x86 executable exe ICProgram ic IC compiler Compiler LexicalAnalysis Syntax Analysis Parsing AST SymbolTableetc. Inter.Rep.(IR) CodeGeneration

  4. Parsing Input: • Sequence of Tokens • A context free grammar • actions Output: • Abstract Syntax Tree • Decide whether program satisfies syntactic structure

  5. Parsing • Context Free Grammars (CFG) • Captures program structure (hierarchy) • Automatically create “efficient” parsers Grammar:E id E num E  E + EE  E * EE  ( E )

  6. E E + E num(5) ( E ) + E * E num(7) id(x) num(5) * id(x) num(7) From text to abstract syntax 5 + (7 * x) program text Lexical Analyzer token stream Grammar:E id E num E  E+EE  E*EE  ( E ) Parser parse tree valid syntaxerror Abstract syntax tree

  7. E E + E num(5) ( E ) + E * E num(7) id(x) num(5) * id(x) num(7) From text to abstract syntax Note: a parse tree describes a run of the parser,an abstract syntax tree is the result of a successful run token stream Grammar:E id E num E  E+EE  E*EE  ( E ) Parser parse tree valid syntaxerror Abstract syntax tree

  8. Parsing terminology Symbols סימנים)):terminals (tokens)+ * ( )id numnon-terminals E Grammar rules :(חוקי דקדוק)E id E num E  E+EE  E*EE  ( E ) Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal Parse tree (עץ גזירה): Derivation (גזירה):EE + E1+ E1+ E * E1+2* E 1+2*3 E E + E 1 * E E 3 2

  9. Ambiguity Grammar rules:E id E num E  E+EE  E*EE  ( E ) Definition: a grammar is ambiguous(רב-משמעי) if there exists an input string that has two different derivations Rightmost derivation Leftmost derivation Parse tree: Parse tree: Derivation:EE + E1+ E1+ E * E1+2* E 1+2*3 Derivation:EE * EE *3E + E * 3E +2* 31 + 2* 3 E E E + E E * E 1 3 * + E E E E 3 2 2 1

  10. Grammar rewriting Unambiguous grammar: E  E + T E  T T  T * F T  F F  id F  num F  ( E ) Ambiguous grammar:E  id E  num E  E + EE  E * EE  ( E ) Parse tree: Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3 E E + T T Note the difference between a language and a grammar:A grammar represents a language.A language can be represented by many grammars. * F T F 3 F 1 2

  11. Parsing methods – Top Down • Starts with the start symbol • Tries to transform it to the input if 5 then print 8 else… Token : rule Sif:S  if E then S else Sif E then S else S5: E  numif 5 then S else Sprint:print Eif 5 then print E else S … Grammar: S  if E then S else S S  begin S L S  print E L  end L  ; S LE  num

  12. Parsing methods – Bottom Up • Starts with the input • Attempt to rewrite it to the start symbol • Widely used in practice • LR(0), SLR(1), LR(1), LALR(1) • JavaCup implements LALR(1)

  13. Bottom Up – parsing 1 + (2) + (3) E  E + (E) E i E + (2) + (3) E + (E) + (3) E + (3) E E + (E) E E E E E 1 + ( 2 ) + 3 ( )

  14. Problems • Ambiguity E = E + E E = i 1 + 2 + 3 -> (1 + 2) + 3 ? 1 + 2 + 3 -> 1 + (2 + 3) ?

  15. Cup • Constructor of Useful Parsers • Automatic LALR(1) parser generator • Input: cup spec file • Output: Syntax analyzer in Java tokens Parserspec .java Parser JavaCup javac AST

  16. Expression calculator terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr | LPAREN expr RPAREN | NUMBER ;

  17. + * + + + + * + a a a a b b c c b b c c Ambiguities a * b + c a + b + c

  18. Increasing precedence Expression calculator terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Contextual precedence

  19. + + + + a a b c b c Resolving ambiguity precedence left PLUS a + b + c

  20. * + + * a a b c b c Resolving ambiguity precedence left PLUSprecedence left MULT a * b + c

  21. + * * + a a b c b c Resolving ambiguity precedence left PLUSprecedence left MULT a + b * c

  22. Resolving ambiguity precedence left PLUSprecedence left MULT * - * - b a b a - a * b

  23. Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; UMINUS never returnedby scanner(used only to define precedence) Rule has precedence of UMINUS

  24. Disambiguation Each terminal assigned with precedence • By default all terminals have lowest precedence • User can assign his own precedence • CUP assigns each production a precedence • Precedence of last terminal in production • expr MINUS expr • User specified contextual precedence • MINUS expr%prec UMINUS

  25. More CUP directives • precedence nonassoc NEQ • Non-associative operators: < > == != etc. • 1<2<3 identified as an error • 6 == 7 == 8 == 9 • start non-terminal • Specifies start non-terminal other than first non-terminal • Can change to test parts of grammar • Getting internal representation • Command line options: • -dump_grammar • -dump_states • -dump_tables • -dump

  26. Generated from tokendeclarations in .cup file Scanner integration import java_cup.runtime.*; %% %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ %% <YYINITIAL>”+” { return new Symbol(sym.PLUS); } <YYINITIAL>”-” { return new Symbol(sym.MINUS); } <YYINITIAL>”*” { return new Symbol(sym.MULT); } <YYINITIAL>”/” { return new Symbol(sym.DIV); } <YYINITIAL>”(” { return new Symbol(sym.LPAREN); } <YYINITIAL>”)” { return new Symbol(sym.RPAREN); } <YYINITIAL>{NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } <YYINITIAL>\n { } <YYINITIAL>. { } Parser gets terminals from the scanner

  27. Assigning meaning • So far, only validation • Add Java code implementing semantic actions expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ;

  28. Assigning meaning expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; • Symbol labels used to name variables • RESULT names the left-hand side symbol

  29. Building an AST • More useful representation of syntax tree • Less clutter • Actual level of detail depends on your design • Basis for semantic analysis • Later annotated with various information • Type information • Computed values

  30. Parse tree vs. AST expr + expr + expr expr expr 1 + ( 2 ) + ( 3 ) 1 2 3

  31. AST construction • AST Nodes constructed during parsing • Bottom-up parser • Grammar rules annotated with actions for AST construction • When node is constructed all children available (already constructed)

  32. int_const int_const int_const val = 3 val = 2 val = 1 plus plus e1 e1 e2 e2 AST construction expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} 1 + (2) + (3) expr + (2) + (3) expr + (expr) + (3) expr + (3) expr + (expr) expr expr expr expr expr expr 1 + ( 2 ) + ( 3 )

More Related