1 / 33

Elaboration or: Semantic Analysis

Elaboration or: Semantic Analysis. Compiler Baojian Hua bjhua@ustc.edu.cn. Front End. lexical analyzer. source code. tokens. abstract syntax tree. parser. semantic analyzer. IR. Elaboration. Also known as type-checking, or semantic analysis context-sensitive analysis

gtobin
Download Presentation

Elaboration or: Semantic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elaboration or:Semantic Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

  2. Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR

  3. Elaboration • Also known as type-checking, or semantic analysis • context-sensitive analysis • Checking the well-formedness of programs: • every variable is declared before use • every expression has a proper type • function calls conform to definitions • all other possible context-sensitive info’ (highly language-dependent) • … • translate AST into intermediate or machine code

  4. Elaboration Example void f (int *p) { x += 4; p (23); “hello” + “world”; } int main () { f () + 5; } What errors can be detected here?

  5. Terminology • Scope • Lifetime • Storage class • Name space

  6. Terminologies: Scope int x; int f () { if (4) { int x; x = 6; } else { int x; x = 5; } x = 8; }

  7. Terminologies: Lifetime static int x; int f () { int x, *p; x = 6; p = malloc (sizeof (*p)); if (3) { static int x; x = 5; } }

  8. Terminologies: Storage class extern int x; int f () { extern int x; x = 6; if (3) { extern int x; x = 5; } }

  9. Terminologies: Name space struct list { int x; struct list *list; } *list; void walk (struct list *list) { list: printf (“%d\n”, list->x); if (list = list->list) goto list; }

  10. Moral • For the purpose of elaboration, must take care of all of this TOGETHER • Scope • Life time • Storage class • Name space • … • All these details are handled by symbol tables!

  11. Symbol Tables • In order to keep track of the types and other infos’ we’d maintain a finite map of program symbols to info’ • symbols: variables, function names, etc. • Such a mapping is called a symbol table, or sometimes an environment • Notation: {x1: t1, x2: t2, …, xn: tn} • where xi: ti (1≤i ≤n) is called a binding

  12. Scope • How to handle lexical scope? • It’s easy, we just insert and remove bindings during elaboration, as we enters and leaves a local scope

  13. Scope int x; σ={x:int} int f () σ1 = σ + {f:…} = {x:int, f:…} { if (4) { int x; σ2 = σ1 + {x:int} = {x:…, f:…, x:…} x = 6; } σ1 else { int x; σ4 = σ1 + {x:int} = {x:…, f:…, x:…} x = 5; } σ1 x = 8; } σ1 Shadowing: “+” is not commutative!

  14. Implementation • Must be efficient! • lots of variables, functions, etc • Two basic approaches: • Functional • symbol table is implemented as a functional data structure (e.g., red-black tree), with no tables ever destroyed or modified • Imperative • a single table, modified for every binding added or removed • This choice is largely independent of the implementation language

  15. Functional Symbol Table • Basic idea: • when implementing σ2 = σ1 + {x:t} • creating a new table σ2, instead of modifyingσ1 • when deleting, restore to the old table • A good data structure for this is BST or red-black tree

  16. BST Symbol Table  ’ c: int c: int e: int a: char b: double

  17. Possible Functional Interface signature SYMBOL_TABLE = sig type ‘a t type key val empty: ‘a t val insert: ‘a t * key * ‘a -> ‘a t val lookup: ‘a t * key -> ‘a option end

  18. Imperative Symbol Tables • The imperative approach almost always involves the use of hash tables • Need to delete entries to revert to previous environment • made simpler because deletes follow a stack discipline • can maintain a stack of entered symbols, so that they can be later popped and removed from the hash table

  19. Possible Imperative Interface signature SYMBOL_TABLE = sig type ‘a t type key val insert: ‘a t * key * ‘a -> unit val lookup: ‘a t * key -> ‘a option val delete: ‘a t * key -> unit val beginScope: unit -> unit val endScope: unit -> unit end

  20. Name Space • It’s trivial to handle name space • one symbol table for each name space • Take C as an example: • Several different name spaces • labels • tags • variables • So …

  21. Implementation of Symbols • For several reasons, it will be useful at some point to represent symbols as elements of a small, densely packed set of identities • fast comparisons (equality) • for dataflow analysis, we will want sets of variables and fast set operations • It will be critically important to use bit strings to represent the sets • For example, your liveness analysis algorithm • More on this later

  22. Types • The representation of types is highly language-dependent • Some key considerations: • name vs. structural equivalence • mutually recursive type definitions • dealing with errors

  23. Name vs. Structural Equivalence struct A { int i; } x; struct B { int i; } y; x = y; • In a language with structural equivalence, this program is legal • But not in a language with name equivalence (e.g., C) • For name equivalence, can generate a unique symbol for each defined type • For structural equivalence, need to recursively compare the types

  24. Mutually recursive type definitions • To process recursive and mutually recursive type definitions, need a placeholder • in ML, an option ref • in C, a pointer • in Java, bind method (read Appel) struct A { int data; struct A *next; struct B *b; }; struct B {…};

  25. Error Diagnostic • To recover from errors, it is useful to have an “any” type • makes it possible to continue more type-checking • In practice, use “int” or guess one • Similarly, a “void” type can be used for expressions that return no value • Source locations are annotated in AST!

  26. Organization of the Elaborator • Module structure: elabProg: Ast.Program.t -> unit elabStm: Ast.Stm.t * tenv * venv -> unit elabDec: Ast.Dec.t * venv * tenv-> tenv * venv elabTy: Ast.Type.t * tenv -> ty elabExp: Ast.Exp.t * venv-> ty elabLVal: Ast.Lval.t * venv-> ty • It will be extended to also do translation. • For now let’s concentrate on type-checking

  27. Elaborate Expressions • Checks that expressions are correctly typed. • Valid expressions are defined in the C specification. • e: t means that e is a valid expression of type t. • venv is a symbol table (environment).

  28. venv| e1: int venv| e2: int venv| e1+e2: int Elaborate Expressions fun elabExp (e, venv) = case e of BinaryExp (PLUS, e1, e2) => let val t1 = elabExp (e1, env) val t2 = elabExp (e2, env) in case (t1, t2) of (Int, Int) => Int | (Int, _) => error (“e2 should be int”) | (_, Int) => error (“e1 should be int”) | _ => error (“should both be int”) end

  29. Elaborate Types • Elaborating types is straightforward, except for recursive types • Need to do “knot-tying”: • extend tenv with bindings for all of the new type names • bind new names to “dummy” bodies • process each definition, replacing the dummy bodies with real definitions

  30. Elaborate Declarations • elabDec will extend the symbol tables with a new binding: int a; • will add {a: int} to the environment. • Remember that environments have to take into account scope of variables!

  31. Elaborate Statement, Lvals, Programs • All follow the same structures as exp or types • elabProg calls the other functions in order to type-check each component of the program (declarations, statements, expressions, …)

  32. Labs • For lab #4, your job is to implement an elaborator for C-- • you may go in two steps • first type-checking • and then generating target code • At every step, check the output carefully to make sure your compiler works correctly

  33. Summary • Elaboration checks the well-formedness of programs • must take care of semantics of source programs • and may translate into more low-level forms • Usually the most big (complex) part in a compiler!

More Related