1 / 25

Semantic Analysis (Generating An AST)

Semantic Analysis (Generating An AST). CS 471 September 26, 2007. Semantic Analysis. Source code. lexical errors. Lexical Analysis. tokens. syntax errors. Parsing. AST. semantic errors. Semantic Analysis. Valid programs: decorated AST. Goals of a Semantic Analyzer.

nairi
Download Presentation

Semantic Analysis (Generating An AST)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Analysis(Generating An AST) CS 471 September 26, 2007

  2. Semantic Analysis • Source code • lexical errors Lexical Analysis • tokens • syntax errors Parsing • AST • semantic errors Semantic Analysis Valid programs: decorated AST

  3. Goals of a Semantic Analyzer • Compiler must do more than recognize whether a sentence belongs to the language… • • Find all possible remaining errors that would make program invalid • undefined variables, types • type errors that can be caught statically • • Figure out useful information for later phases • types of all expressions • data layout

  4. Semantic Actions • Can do useful things with the parsed phrases • Each terminal and nonterminal may be associated with type, e.g. exp: INT type is int • For rule: A  B C D • Type must match A • Value can be built with BCD

  5. Semantic Actions Semantic action executed when grammar production is reduced • Recursive-descent parser: semantic code interspersed with control flow • Yacc: fragments of C code attached to a grammar production

  6. Interpreter • Could develop an interpreter that executes the program as part of the semantic actions! • Example Grammar: • E  id • E  E + E • E  E – E • E  E * E • E  -E

  7. Unions in Yacc • %union allows us to declare a union datatype • used to package the types/attributes of symbols • %union { • int pos; • int ival; • string sval; • struct { • int intval; • enum Types valtype; • } constantval; • A_exp exp; • } Exported as YYSTYPE

  8. Types in Yacc • Using the values of union structs, tell Yacc the types • Terminals • %token <sval> ID STRING • %token <ival> INT • %token <pos> COMMA SEMI LBRACE RBRACE … • And Nonterminals (use %type) • %type <exp> expression program type LHS of production

  9. Symbols in Yacc • The symbol $n (n > 0) refers to the attribute of nth symbol on the RHS • The symbol $$ refers the attribute of the LHS • The symbol $n (n  0) refers to contextual information • Note: actions in middle contribute as a symbol! • expr : expr1 PLUS expr2 $$ $1 $3

  10. Interpreter in Yacc • %{ declarations of yylex and yyerror %} • %union {int num; string id} • % token <num> INT • % token <id> ID • % type <num> exp • % start exp • %left PLUS MINUS • %left TIMES • %left UMINUS • %% • [please fill in solution] E  id E  E + E E  E – E E  E * E E  -E Recall expr : expr1 PLUS expr2 $$ $1 $3

  11. Internally: A Semantic Stack • Implemented using a stack parallel to the state stack • Stack Input Action • 1 + 2 * 3 $ shift • INT: 1 + 2 * 3 $ reduce • exp: 1 + 2 * 3 $ shift • exp: 1 +: 2 * 3 $ shift • exp: 1 +: INT: 2 * 3 $ reduce • exp: 1 +: exp: 2 3 $ shift • exp: 1 +: exp: 2 *: $ shift • exp: 1 +: exp: 2 *: INT: 3 $ reduce • exp: 1 +: exp: 2 *: exp: 3 $ reduce • exp: 1 +: exp: 6 $ reduce • exp: 7 $ accept

  12. Inlined TypeChecker and CodeGen • You can even type check and generate code: • expr : expr PLUS expr { • if ($1.type == $3.type && • ($1.type == IntType || • $1.type == RealType)) $$.type = $1.type • else error(“+ applied on wrong type!”); • GenerateAdd($1, $3, $$); • }

  13. Problems • Difficult to read • Difficult to maintain • Compiler must analyze program in order parsed • Instead … we split up tasks

  14. Compiler ‘main program’ • void Compile() { • TokenStream l = Lexer(input); • AST tree = Parser(l); • if (TypeCheck(tree)) • IR ir = genIntermediateCode(tree); • emitCode(ir); • } • }

  15. Thread of control compile Input Stream AST characters parse Lexer tokens getToken Parser AST readStream

  16. Producing the Parse Tree • Separates issues of syntax (parsing) from issues of semantics (type checking, translation to machine code) • One leaf for every token • One internal node for every reduction during parsing • Concrete parse tree represents concrete syntax • But … parse tree has problems • Punctuation tokens redundant • Structure of the tree conveys this info • Enter the Abstract Syntax Tree

  17. AST • • Abstract Syntax Tree is a tree representation of the program. Used for • semantic analysis (type checking) • some optimization (e.g. constant folding) • intermediate code generation (sometimes intermediate code = AST with somewhat different set of nodes) • • Compiler phases = recursive tree traversals

  18. Do We Need An AST? • • Old-style compilers: semantic actions generate code during parsing expr ::= expr PLUS expr {: emitCode(add); :} input stack parser • Problems: • • hard to maintain • • limits language features • • not modular! code

  19. Interesting Detour • Old compilers didn’t create ASTs … not enough memory to store entire program • Can also see reasons for C requiring forward declarations - avoids an extra compilation pass

  20. Positions • In one pass compiler – errors reported using position of the lexer as approximation (global var) • Abstract syntax data structures must have pos fields • Line number • Char number • Line number is unambiguous • Char number is a matter of style

  21. Abstract Syntax for Tiger • /* absyn.h */ • typedef struct A_var_ * A_var; • struct A_var_ • { enum {A_simpleVar,A_fieldVar,A_subscriptVar}kind; • A_pos pos; • union {S_symbol simple; • struct {A_var var; • S_symbol sym;} field; • struct {A_var var; • A_exp exp;} subscript; • } u; • };

  22. More Syntax (Constructors…p.98) • A_var A_SimpleVar(A_pos pos, S_symbol sym); • … • A_exp A_WhileExp(A_pos pos, A_exp test, A_exp body); • … • A_expList A_ExpList(A_exp head, A_expList tail);

  23. Tiger Program • (a := 5; a+1) translates to: • A_SeqExp(2, • A_ExpList(A_AssignExp(4, • A_SimpleVar(2, • S_Symbol(“a”)), A_IntExp(7,5)), • A_ExpList((A_OpExp(11,A_plusOp, • A_VarExp(A_SimpleVar(10, • S_Symbol(“a”))),A_IntExp(12,1))), • NULL))) • AssignExp choose column of “:=“ for pos • OpExp choose column of “+” for pos

  24. Some Odd Tiger Features • Tiger allows mutually recursive declarations: • let var a + 5 • function f() : int = g(a) • function g(i: int) = f() • in f() • end • Thus: FunctionDec constructor takes a list of functions

  25. Correlation to Yacc (and your project) • (Demo) • Checklist • Detailed look at the Tiger AST (absyn.h) • Edit tiger.grm • The Tiger Language Manual • PA3 and PA4 make heavy use of it • Follow the structure to generate your yacc file

More Related