1 / 62

CSE P501 – Compiler Construction

CSE P501 – Compiler Construction. Semantic Checks Attribute Grammars Symbol Tables Types Disclaimer: more here than needed for the MiniJava project. What to check the program is legal?. class C { int a; C( int v) { a = v ; } void setA ( int v) { a = v; } }

romeo
Download Presentation

CSE P501 – Compiler Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE P501 – Compiler Construction Semantic Checks Attribute Grammars Symbol Tables Types Disclaimer: more here than needed for the MiniJava project Jim Hogg - UW - CSE - P501

  2. What to check the program is legal? class C { int a; C(int v) { a = v; } void setA(int v) { a = v; } } class Main { public static void main() { C c = new C(17); c.setA(42); } } Jim Hogg - UW - CSE - P501

  3. Beyond Syntax There is a level of correctness not captured by a CFG: • Has a variable been declared before it is used? • Are types consistent in an expression? • In the assignment x=y, is y assignable to x? • Does a method call have right number and types of parameters? • In a selector p.q, is q a method or field of object p? • Is variable x guaranteed to be initialized before it is used? • Could p be null when p.q is executed? • Etc Jim Hogg - UW - CSE - P501

  4. What else to know to generate code? • Where are fields allocated in an object? • How big are objects? (ie, how much storage needs to be allocated by new) • Where are local variables stored when a method is called? • Which methods are associated with an object/class? • How do we figure out which method to call based on the run-time type of an object? Jim Hogg - UW - CSE - P501

  5. Semantic Analysis • Main tasks: • Extract types and other information from the program • Check language rules that go beyond the context-free grammar • Resolve names – connect declarations and uses • “Understand” the program – last phase of front end • ... so the program is "correct" for hand-off to the backend • Key data structures: symbol tables • For each identifier in the program, record its attributes (kind, type, etc) • Later: assign storage locations (stack frame offsets) for variables; add other annotations Jim Hogg - UW - CSE - P501

  6. Some Kinds of Semantic Information Jim Hogg - UW - CSE - P501

  7. Semantic Checks • Grammar = BNF • Short: eg, Java in a couple of pages • Semantics = Language Reference Manual • Long: eg, Java SE8 = 760 pages • For each language construct we want to know • What semantic rules should be checked • For an expression, what is its type (is expression legal?) • For a declaration, what info to capture for use elsewhere? Jim Hogg - UW - CSE - P501

  8. A Sampling of Semantic Checks: 1 • Appearance of a name: id • id has been declared and is in-scope • Inferred type of id == its declared type • Memory location assigned by compiler • Constant: v • Inferred type and value are explicit Jim Hogg - UW - CSE - P501

  9. A Sampling of Semantic Checks: 2 • Binary operator: e1 op e2 • e1 and e2 have compatible types • Identical, or • well-defined conversion to appropriate types (not in MiniJava) • Inferred type is a function of the operator and operand types Jim Hogg - UW - CSE - P501

  10. A Sampling of Semantic Checks: 3 • Assignment: e1 = e2 • e1 is assignable (not constant or expression) • e1 and e2have compatible types • Identical, or • e2 can be converted to e1 (eg:char to int), but not in MiniJava, or • Type of e2 is a subclass of type of e1 (can be decided at compile time) • Inferred type is type of e1 • Location where value stored assigned by compiler class D extends B { D next; } d.next = b; // ok? d.next = d; // ok? B x + y = 42 // pointers? 1 = 2 // never good x[3] = true // in MiniJava? D Jim Hogg - UW - CSE - P501

  11. A Sampling of Semantic Checks: 4 • Cast: (e1) e2 [not in MiniJava] • e1must bea type • e2is such that: • Same type as e1 • Can be converted to type e1 (eg: int to double) • e1 is a superclass - upcast • e1 is a subclass - downcast (needs runtime check) • Inferred type is e1 (int) x // char x? double x? (42) x // never good (boolean) x // int x? (int[]) x // ? class D extends B ... (D) new B(); // downcast (B) new D(); // upcast Jim Hogg - UW - CSE - P501

  12. A Sampling of Semantic Checks: 5 • Field reference: exp.f • expis a reference type (not a valuetype) • The class of exphas a field named f • Inferred type is declared type of f y C • Reference Type • x = y, then x points at y • eg: C y = new C(); x = y; • Value Type • x = y, then x receives a copy of y • eg: int y = 42; x = y; x y 42 x 42 Jim Hogg - UW - CSE - P501

  13. A Sampling of Semantic Checks: 6 • Method call exp.m(e1, e2, …, en) • expmust bea reference type • The class of exphas a method named m • The method has n parameters • Each argument must be assignment-compat with corresponding parameter • Inferred type is given by method declaration (or is void) Method Overloading Method Defs Method Calls More Method Defs m(int, int) {... m(double, double) {... m(1, 2); m(1.0, 2.0); m(1.0, 2); m(1, 2.0); m(int, int) {... m(int, double) {... m(double, int) {... Jim Hogg - UW - CSE - P501

  14. A Sampling of Semantic Checks: 7 Return statement • return exp; // exp must be assignment-compatible with method's return type • return; // better be a void method! Jim Hogg - UW - CSE - P501

  15. Semantic Analysis • Parser builds AST • Now check semantic constraints • Can partly be done during the parse, but often easier to organize as separate phases - eg: visitor pattern over AST • And some things can’t be done on-the-fly. eg: info about identifiers that are used before they are declared (fields, classes) [ cf: declare-before-use, as in Pascal ] • Information stored in Symbol Tables • Generated by semantic analysis, used there and later Jim Hogg - UW - CSE - P501

  16. Attribute Grammars • We can specify Java micro-syntax with a few dozen regex • Then find a tool (JFlex) to create a scanner from these regex • We can specify Java syntax with a few pages of BNF • Then find a tool (CUP) to create a parser from that BNF • What about the huge collection of constraint checks? (760 pages in the Java Language Reference Manual) • Attribute Grammars? • Then find a tool (???) to create a semantic checker for that Attribute Grammar? Jim Hogg - UW - CSE - P501

  17. Attribute Example • Give each AST node a .val attribute to hold its computed value • AST and attribution for (1+2) * (6 / 2) * + / 6 1 2 2 • This example is much simplified • Attributes should really be attached to nodes in the Parse Tree • Attribute equations should really be attached to each BNF production Jim Hogg - UW - CSE - P501

  18. Attribute Example • Give each node has a .val attribute to hold the computed value of that node • AST and attribution for (1+2) * (6 / 2) .val * .val .val + / .val .val .val .val .val 6 1 2 2 Jim Hogg - UW - CSE - P501

  19. Attribute Example • Give each node has a .val attribute to hold the computed value of that node • AST and attribution for (1+2) * (6 / 2) .val=9 * .val=3 .val=3 + / .val=2 .val=6 .val=2 .val=1 6 1 2 2 Jim Hogg - UW - CSE - P501

  20. Attribute Grammars • Idea: associate attributes with each node in AST • Eg: • Type info (int, boolean, int[], class for MiniJava) • Storage location (eg: byte-offset 28 from frame-pointer) • Assignable (eg: constant vs variable) • Numeric value (if node represents a constant) • etc • Notation: X.a if a is an attribute of node X Jim Hogg - UW - CSE - P501

  21. Inherited and Synthesized Attributes Given a production A  Y1 Y2 … Yn A synthesized attribute A.a is a function of Y’s (bottom-up) An inherited attribute Yi.b is a function of X.a and other Yj.c (top-down and sideways) Sometimes restricted, eg: Y’s to the left Jim Hogg - UW - CSE - P501

  22. Attribute Equations • For each kind of node we give a set of equations relating that node's attributes and its children Eg: plus.val = e1.val + e2.val • or, relating that node's attributes and its parent • Attribution (evaluation) means finding a solution that satisfies all of the equations in the tree Jim Hogg - UW - CSE - P501

  23. Informal Example of Attribute Rules: 1 • Grammar for a trivial language: progdeclstmt decl int id; stm exp = exp ; exp id | exp + exp | 1 • What attributes would we create in order to check types and assignability? Jim Hogg - UW - CSE - P501

  24. Informal Example of Attribute Rules: 1 • Grammar for a trivial language: progdeclstmt decl int id; stm exp = exp ; exp id | exp + exp | 1 • For stm exp = exp; need to check that: • LHS exp is assignable - not a constant, not an arithmetic expression • RHS exphas a type that is assignment-compatible with LHS Jim Hogg - UW - CSE - P501

  25. Informal Example of Attribute Rules: 2 Attributes progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 • .env • "environment" • link to a Symbol Table entry • synthesized by decl, inherited by stm • each entry maps a name to its type and value • .type • expression type (int, Boolean, int[], class) • synthesized • .kind • variable versus value (lvalueversusrvalue, in C-speak) • synthesized Jim Hogg - UW - CSE - P501

  26. Attributes for Declarations progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .env decl int id ; Note - not all node types have, or need, attributes Jim Hogg - UW - CSE - P501

  27. Attributes for Programs progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 prog .env .env decl stm Jim Hogg - UW - CSE - P501

  28. Attributes for Constants progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .type .kind exp 1 Jim Hogg - UW - CSE - P501

  29. Attributes for Expressions progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .type .kind exp .type .kind id Jim Hogg - UW - CSE - P501

  30. Attributes for Addition progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .env .type .kind .env .type .kind exp .env .type .kind exp1 + exp2 Jim Hogg - UW - CSE - P501

  31. Attributes for Assignment progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 .env .type .kind stm .env .type .kind .env .type .kind exp1 = exp2 Jim Hogg - UW - CSE - P501

  32. Example progdeclstm decl int id; stm exp = exp ; exp id | exp + exp | 1 prog int x; x = x + 1; decl stm .env .type .kind exp = exp int id + exp exp id x x 1 id x Jim Hogg - UW - CSE - P501

  33. Extensions • Can be extended to handle sequences of declarations and statements • Sequence of declarations builds up a combined environment – each decl synthesizes a new environment from previous, augmented with new binding • Full environment is passed down to statements and expressions Jim Hogg - UW - CSE - P501

  34. Observations • These are equational computations - no sequential modification of state (think functional programming - no side-effects) • Issues on deciding whether a given set of attribute equations will actually converge • Can be automated, provided the attribute equations are non-circular • Problems • Non-local computation • Can’t afford to literally pass around copies of large, aggregate structures like environments Jim Hogg - UW - CSE - P501

  35. In Practice • Attribute Grammars give us a way of thinking how to structure semantic checks • Use Symbol Tables to hold environment information • Add fields to nodes to refer to appropriate attributes • symbol table entries for identifiers • types for expressions • insert into appropriate places in AST class hierarchy; eg, most statements don’t need types • But, commercial compilers don't use Attribute Grammars • Instead? - "death by a thousand if's" Jim Hogg - UW - CSE - P501

  36. Symbol Tables A table that maps id  <type, kind, location, ...> API • lookup(id)  info = <type, kind, location, ...> • enter(id, info) // updates table • open() // opens new scope • close() // closes scope Use • Build table from declarations during (or before) AST walk • Use info to check semantic rules (eg: declare before use) Jim Hogg - UW - CSE - P501

  37. Aside: Implementing Symbol Tables • Formerly: big topic in classical compiler courses: implementing a hashed symbol table - hash function, table size, collisions chains, etc (The C standard library doesn't provide any) • Now: just use the collection classes provided with the implementation language (Java, C#, C++, ML, Haskell, ...) • Then tune & optimize if it really matters • In production compilers, it really matters! • For Java: • Map<K,V> • ArrayListfor ordered lists (eg: parameters) Jim Hogg - UW - CSE - P501

  38. Symbol Tables for MiniJava: Global • A MiniJava program = 1 file = multiple classes (no separate compilation) • One Global table per program • Maps class name to Class symbol table • Created in a pass over ClassDeclAST nodes • Used to check field/method names and extract their info • Global Symbol Table lives throughout the compilation • In real Java, Symbol Table info is persisted into .class files • In C#, Symbol Table info is persisted as 'metadata' into output assembly Jim Hogg - UW - CSE - P501

  39. Symbol Tables for MiniJava: Class • One Class symbol table for each class • 1 entry for each field in class • name, type, (public|private), offset-in-class • 1 entry for each method in class • List of parameters: name, type, ordinal (or ordered) In full Java, need some way to handle namespaces Ie: same identifier can be both a method and a field in a class Jim Hogg - UW - CSE - P501

  40. Symbol Tables for MiniJava: Locals • One Locals table for each method • One entry for each parameter • Contents = type, memory location • One entry for each local variable • Contents = type, memory location • Needed only while compiling the method • Can discard when done (after first, or final, pass) Jim Hogg - UW - CSE - P501

  41. Beyond MiniJava • We don't deal with: • Class static fields or methods • Accessibility - public, protected, private • Inner classes • Nested scopes in methods – re-use of identifiers, nested functions (ML, Pascal, …) • Basic idea: new symbol tables for inner scopes, linked to surrounding scope’s table • Look for identifier in inner scope; if not found look in surrounding scope (recursively) • Pop back up on scope exit Jim Hogg - UW - CSE - P501

  42. Engineering Issues • In practice, want to retain O(1) lookup • Use hash tables with additional information to get the scope nesting right • Scope entry/exit operations • In multipass compilers, Symbol Table info needs to persist after analysis of inner scopes for use on later passes • See a compiler textbook for ideas & details Jim Hogg - UW - CSE - P501

  43. Error Recovery • What to do when compiler finds an undeclared identifier? • eg: x = y + 1 and there is no entry for y in Symbol Table • Only complain once (Why?) • Create an entry for y in Symbol Table, to suppress future error messages • Assign the forged entry for y to have a type of “unknown” • “Unknown” is the type of all malformed expressions and is compatible with all other types • Can avoid redundant error messages (how?) Jim Hogg - UW - CSE - P501

  44. “Predefined” Things • Many languages have some “predefined” items • classes, functions (eg: "maxint") • "standard library" or "prelude" • Write initialization code to inject predefined info in Symbol Table • Preferably, import a file including "standard prelude". Tradeoffs? • Rest of compiler doesn’t need to know the difference between “predefined” items and ones found in the user program Jim Hogg - UW - CSE - P501

  45. Types • Classical roles of types in programming languages • Compile-time error detection (find errors ASAP) • Improved expressiveness (eg: method & operator overloading) • Provide information to optimizer • Runtime safety • Eg: Haskell - if your program type-checks, it's like correct • Eg: Even FORTRAN had INTEGER and REAL - different bit layouts • Can we ensure programs are correct with enough testing? Jim Hogg - UW - CSE - P501

  46. Terminology Static vs dynamic typing • static: checking done prior to execution (eg, at compile-time) • dynamic: checking during execution Strong vsweak typing • strong: guarantees no illegal operations performed • weak: can’t make guarantees Caveats: • Hybrids common • Inconsistent usage common • “untyped” or “typeless” could mean dynamic or weak Jim Hogg - UW - CSE - P501

  47. Type Systems • Base Types • Fundamental, atomic types • Eg: int, double, char, bool • Compound or Constructed Types • Built up from other types (recursively) • Type-Constructors include: • arrays • records/structs/classes • pointers • enumerations • functions • modules (eg: ML) Jim Hogg - UW - CSE - P501

  48. How to Represent Types in a Compiler? • Create a shallow class hierarchy. Eg: abstract class Type {...} class ClassType extends Type {...} class BaseType extends Type {...} • Should not need too many of these Jim Hogg - UW - CSE - P501

  49. Types vs ASTs • Types are not AST nodes! (eg: IntType != INT != ILIT) • AST = abstract representation of source program (including source program type info) • Types = abstract representation of types for semantics checks, inference, etc. • Can include information not explicitly represented in the source code, or may describe types in ways more convenient for processing • Be sure you have a separate “type” class hierarchy in your compiler distinct from the AST Jim Hogg - UW - CSE - P501

  50. Base Types • For each base type (int, bool, etc), create a single object to represent it • Symbol table entries and AST nodes for expressions refer to these to represent type info • Usually create at compiler startup • Useful to create a type void object to tag functions that return no value • Also useful to create a type unknown object for errors • void and unknown types reduce need for special case code • ie, pass these types around - no need to check everywhere Jim Hogg - UW - CSE - P501

More Related