cse p501 compiler construction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CSE P501 – Compiler Construction PowerPoint Presentation
Download Presentation
CSE P501 – Compiler Construction

Loading in 2 Seconds...

play fullscreen
1 / 62

CSE P501 – Compiler Construction - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

CSE P501 – Compiler Construction. Semantic Checks Attribute Grammars Symbol Tables Types Disclaimer: more here than needed for the MiniJava project. What to check the program is legal?. class C { int a; C( int v) { a = v ; } void setA ( int v) { a = v; } }

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CSE P501 – Compiler Construction' - romeo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cse p501 compiler construction

CSE P501 – Compiler Construction

Semantic Checks

Attribute Grammars

Symbol Tables

Types

Disclaimer: more here than needed for the MiniJava project

Jim Hogg - UW - CSE - P501

what to check the program is legal
What to check the program is legal?

class C {

int a;

C(int v) { a = v; }

void setA(int v) { a = v; }

}

class Main {

public static void main() {

C c = new C(17);

c.setA(42);

}

}

Jim Hogg - UW - CSE - P501

beyond syntax
Beyond Syntax

There is a level of correctness not captured by a CFG:

  • Has a variable been declared before it is used?
  • Are types consistent in an expression?
  • In the assignment x=y, is y assignable to x?
  • Does a method call have right number and types of parameters?
  • In a selector p.q, is q a method or field of object p?
  • Is variable x guaranteed to be initialized before it is used?
  • Could p be null when p.q is executed?
  • Etc

Jim Hogg - UW - CSE - P501

what else to know to generate code
What else to know to generate code?
  • Where are fields allocated in an object?
  • How big are objects? (ie, how much storage needs to be allocated by new)
  • Where are local variables stored when a method is called?
  • Which methods are associated with an object/class?
    • How do we figure out which method to call based on the run-time type of an object?

Jim Hogg - UW - CSE - P501

semantic analysis
Semantic Analysis
  • Main tasks:
    • Extract types and other information from the program
    • Check language rules that go beyond the context-free grammar
    • Resolve names – connect declarations and uses
    • “Understand” the program – last phase of front end
    • ... so the program is "correct" for hand-off to the backend
  • Key data structures: symbol tables
    • For each identifier in the program, record its attributes (kind, type, etc)
    • Later: assign storage locations (stack frame offsets) for variables; add other annotations

Jim Hogg - UW - CSE - P501

some kinds of semantic information
Some Kinds of Semantic Information

Jim Hogg - UW - CSE - P501

semantic checks
Semantic Checks
  • Grammar = BNF
    • Short: eg, Java in a couple of pages
  • Semantics = Language Reference Manual
    • Long: eg, Java SE8 = 760 pages
  • For each language construct we want to know
    • What semantic rules should be checked
    • For an expression, what is its type (is expression legal?)
    • For a declaration, what info to capture for use elsewhere?

Jim Hogg - UW - CSE - P501

a sampling of semantic checks 1
A Sampling of Semantic Checks: 1
  • Appearance of a name: id
    • id has been declared and is in-scope
    • Inferred type of id == its declared type
    • Memory location assigned by compiler
  • Constant: v
    • Inferred type and value are explicit

Jim Hogg - UW - CSE - P501

a sampling of semantic checks 2
A Sampling of Semantic Checks: 2
  • Binary operator: e1 op e2
    • e1 and e2 have compatible types
      • Identical, or
      • well-defined conversion to appropriate types (not in MiniJava)
    • Inferred type is a function of the operator and operand types

Jim Hogg - UW - CSE - P501

a sampling of semantic checks 3
A Sampling of Semantic Checks: 3
  • Assignment: e1 = e2
    • e1 is assignable (not constant or expression)
    • e1 and e2have compatible types
      • Identical, or
      • e2 can be converted to e1 (eg:char to int), but not in MiniJava, or
      • Type of e2 is a subclass of type of e1 (can be decided at compile time)
    • Inferred type is type of e1
    • Location where value stored assigned by compiler

class D extends B { D next; }

d.next = b; // ok?

d.next = d; // ok?

B

x + y = 42 // pointers?

1 = 2 // never good

x[3] = true // in MiniJava?

D

Jim Hogg - UW - CSE - P501

a sampling of semantic checks 4
A Sampling of Semantic Checks: 4
  • Cast: (e1) e2 [not in MiniJava]
    • e1must bea type
    • e2is such that:
      • Same type as e1
      • Can be converted to type e1 (eg: int to double)
      • e1 is a superclass - upcast
      • e1 is a subclass - downcast (needs runtime check)
    • Inferred type is e1

(int) x // char x? double x?

(42) x // never good

(boolean) x // int x?

(int[]) x // ?

class D extends B ...

(D) new B(); // downcast

(B) new D(); // upcast

Jim Hogg - UW - CSE - P501

a sampling of semantic checks 5
A Sampling of Semantic Checks: 5
  • Field reference: exp.f
    • expis a reference type (not a valuetype)
    • The class of exphas a field named f
    • Inferred type is declared type of f

y

C

  • Reference Type
    • x = y, then x points at y
    • eg: C y = new C(); x = y;
  • Value Type
    • x = y, then x receives a copy of y
    • eg: int y = 42; x = y;

x

y

42

x

42

Jim Hogg - UW - CSE - P501

a sampling of semantic checks 6
A Sampling of Semantic Checks: 6
  • Method call exp.m(e1, e2, …, en)
    • expmust bea reference type
    • The class of exphas a method named m
    • The method has n parameters
    • Each argument must be assignment-compat with corresponding parameter
    • Inferred type is given by method declaration (or is void)

Method Overloading

Method Defs

Method Calls

More Method Defs

m(int, int) {...

m(double, double) {...

m(1, 2);

m(1.0, 2.0);

m(1.0, 2);

m(1, 2.0);

m(int, int) {...

m(int, double) {...

m(double, int) {...

Jim Hogg - UW - CSE - P501

a sampling of semantic checks 7
A Sampling of Semantic Checks: 7

Return statement

  • return exp; // exp must be assignment-compatible with method's return type
  • return; // better be a void method!

Jim Hogg - UW - CSE - P501

semantic analysis1
Semantic Analysis
  • Parser builds AST
  • Now check semantic constraints
    • Can partly be done during the parse, but often easier to organize as separate phases - eg: visitor pattern over AST
    • And some things can’t be done on-the-fly. eg: info about identifiers that are used before they are declared (fields, classes) [ cf: declare-before-use, as in Pascal ]
  • Information stored in Symbol Tables
    • Generated by semantic analysis, used there and later

Jim Hogg - UW - CSE - P501

attribute grammars
Attribute Grammars
  • We can specify Java micro-syntax with a few dozen regex
    • Then find a tool (JFlex) to create a scanner from these regex
  • We can specify Java syntax with a few pages of BNF
    • Then find a tool (CUP) to create a parser from that BNF
  • What about the huge collection of constraint checks? (760 pages in the Java Language Reference Manual)
  • Attribute Grammars?
    • Then find a tool (???) to create a semantic checker for that Attribute Grammar?

Jim Hogg - UW - CSE - P501

attribute example
Attribute Example
  • Give each AST node a .val attribute to hold its computed value
  • AST and attribution for (1+2) * (6 / 2)

*

+

/

6

1

2

2

  • This example is much simplified
  • Attributes should really be attached to nodes in the Parse Tree
  • Attribute equations should really be attached to each BNF production

Jim Hogg - UW - CSE - P501

attribute example1
Attribute Example
  • Give each node has a .val attribute to hold the computed value of that node
  • AST and attribution for (1+2) * (6 / 2)

.val

*

.val

.val

+

/

.val

.val

.val

.val

.val

6

1

2

2

Jim Hogg - UW - CSE - P501

attribute example2
Attribute Example
  • Give each node has a .val attribute to hold the computed value of that node
  • AST and attribution for (1+2) * (6 / 2)

.val=9

*

.val=3

.val=3

+

/

.val=2

.val=6

.val=2

.val=1

6

1

2

2

Jim Hogg - UW - CSE - P501

attribute grammars1
Attribute Grammars
  • Idea: associate attributes with each node in AST
  • Eg:
    • Type info (int, boolean, int[], class for MiniJava)
    • Storage location (eg: byte-offset 28 from frame-pointer)
    • Assignable (eg: constant vs variable)
    • Numeric value (if node represents a constant)
    • etc
  • Notation: X.a if a is an attribute of node X

Jim Hogg - UW - CSE - P501

inherited and synthesized attributes
Inherited and Synthesized Attributes

Given a production A  Y1 Y2 … Yn

A synthesized attribute A.a is a function of Y’s (bottom-up)

An inherited attribute Yi.b is a function of X.a and other Yj.c

(top-down and sideways)

Sometimes restricted, eg: Y’s to the left

Jim Hogg - UW - CSE - P501

attribute equations
Attribute Equations
  • For each kind of node we give a set of equations relating that node's attributes and its children

Eg: plus.val = e1.val + e2.val

  • or, relating that node's attributes and its parent
  • Attribution (evaluation) means finding a solution that satisfies all of the equations in the tree

Jim Hogg - UW - CSE - P501

informal example of attribute rules 1
Informal Example of Attribute Rules: 1
  • Grammar for a trivial language:

progdeclstmt

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

  • What attributes would we create in order to check types and assignability?

Jim Hogg - UW - CSE - P501

informal example of attribute rules 11
Informal Example of Attribute Rules: 1
  • Grammar for a trivial language:

progdeclstmt

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

  • For stm exp = exp; need to check that:
  • LHS exp is assignable - not a constant, not an arithmetic expression
  • RHS exphas a type that is assignment-compatible with LHS

Jim Hogg - UW - CSE - P501

informal example of attribute rules 2
Informal Example of Attribute Rules: 2

Attributes

progdeclstm

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

  • .env
    • "environment"
    • link to a Symbol Table entry
    • synthesized by decl, inherited by stm
    • each entry maps a name to its type and value
  • .type
    • expression type (int, Boolean, int[], class)
    • synthesized
  • .kind
    • variable versus value (lvalueversusrvalue, in C-speak)
    • synthesized

Jim Hogg - UW - CSE - P501

attributes for declarations
Attributes for Declarations

progdeclstm

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

.env

decl

int

id

;

Note - not all node types have, or need, attributes

Jim Hogg - UW - CSE - P501

attributes for programs
Attributes for Programs

progdeclstm

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

prog

.env

.env

decl

stm

Jim Hogg - UW - CSE - P501

attributes for constants
Attributes for Constants

progdeclstm

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

.type

.kind

exp

1

Jim Hogg - UW - CSE - P501

attributes for expressions
Attributes for Expressions

progdeclstm

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

.type

.kind

exp

.type

.kind

id

Jim Hogg - UW - CSE - P501

attributes for addition
Attributes for Addition

progdeclstm

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

.env

.type

.kind

.env

.type

.kind

exp

.env

.type

.kind

exp1

+

exp2

Jim Hogg - UW - CSE - P501

attributes for assignment
Attributes for Assignment

progdeclstm

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

.env

.type

.kind

stm

.env

.type

.kind

.env

.type

.kind

exp1

=

exp2

Jim Hogg - UW - CSE - P501

example
Example

progdeclstm

decl int id;

stm exp = exp ;

exp id | exp + exp | 1

prog

int x; x = x + 1;

decl

stm

.env

.type

.kind

exp

=

exp

int

id

+

exp

exp

id

x

x

1

id

x

Jim Hogg - UW - CSE - P501

extensions
Extensions
  • Can be extended to handle sequences of declarations and statements
    • Sequence of declarations builds up a combined environment – each decl synthesizes a new environment from previous, augmented with new binding
    • Full environment is passed down to statements and expressions

Jim Hogg - UW - CSE - P501

observations
Observations
  • These are equational computations - no sequential modification of state (think functional programming - no side-effects)
  • Issues on deciding whether a given set of attribute equations will actually converge
  • Can be automated, provided the attribute equations are non-circular
  • Problems
    • Non-local computation
    • Can’t afford to literally pass around copies of large, aggregate structures like environments

Jim Hogg - UW - CSE - P501

in practice
In Practice
  • Attribute Grammars give us a way of thinking how to structure semantic checks
  • Use Symbol Tables to hold environment information
  • Add fields to nodes to refer to appropriate attributes
    • symbol table entries for identifiers
    • types for expressions
    • insert into appropriate places in AST class hierarchy; eg, most statements don’t need types
  • But, commercial compilers don't use Attribute Grammars
    • Instead? - "death by a thousand if's"

Jim Hogg - UW - CSE - P501

symbol tables
Symbol Tables

A table that maps id  <type, kind, location, ...>

API

  • lookup(id)  info = <type, kind, location, ...>
  • enter(id, info) // updates table
  • open() // opens new scope
  • close() // closes scope

Use

  • Build table from declarations during (or before) AST walk
  • Use info to check semantic rules (eg: declare before use)

Jim Hogg - UW - CSE - P501

aside implementing symbol tables
Aside: Implementing Symbol Tables
  • Formerly: big topic in classical compiler courses: implementing a hashed symbol table - hash function, table size, collisions chains, etc (The C standard library doesn't provide any)
  • Now: just use the collection classes provided with the implementation language (Java, C#, C++, ML, Haskell, ...)
    • Then tune & optimize if it really matters
    • In production compilers, it really matters!
  • For Java:
    • Map<K,V>
    • ArrayListfor ordered lists (eg: parameters)

Jim Hogg - UW - CSE - P501

symbol tables for minijava global
Symbol Tables for MiniJava: Global
  • A MiniJava program = 1 file = multiple classes (no separate compilation)
  • One Global table per program
  • Maps class name to Class symbol table
    • Created in a pass over ClassDeclAST nodes
    • Used to check field/method names and extract their info
  • Global Symbol Table lives throughout the compilation
    • In real Java, Symbol Table info is persisted into .class files
    • In C#, Symbol Table info is persisted as 'metadata' into output assembly

Jim Hogg - UW - CSE - P501

symbol tables for minijava class
Symbol Tables for MiniJava: Class
  • One Class symbol table for each class
    • 1 entry for each field in class
      • name, type, (public|private), offset-in-class
    • 1 entry for each method in class
      • List of parameters: name, type, ordinal (or ordered)

In full Java, need some way to handle namespaces

Ie: same identifier can be both a method and a field in a class

Jim Hogg - UW - CSE - P501

symbol tables for minijava locals
Symbol Tables for MiniJava: Locals
  • One Locals table for each method
    • One entry for each parameter
      • Contents = type, memory location
    • One entry for each local variable
      • Contents = type, memory location
  • Needed only while compiling the method
  • Can discard when done (after first, or final, pass)

Jim Hogg - UW - CSE - P501

beyond minijava
Beyond MiniJava
  • We don't deal with:
    • Class static fields or methods
    • Accessibility - public, protected, private
    • Inner classes
    • Nested scopes in methods – re-use of identifiers, nested functions (ML, Pascal, …)
  • Basic idea: new symbol tables for inner scopes, linked to surrounding scope’s table
    • Look for identifier in inner scope; if not found look in surrounding scope (recursively)
    • Pop back up on scope exit

Jim Hogg - UW - CSE - P501

engineering issues
Engineering Issues
  • In practice, want to retain O(1) lookup
    • Use hash tables with additional information to get the scope nesting right
      • Scope entry/exit operations
  • In multipass compilers, Symbol Table info needs to persist after analysis of inner scopes for use on later passes
    • See a compiler textbook for ideas & details

Jim Hogg - UW - CSE - P501

error recovery
Error Recovery
  • What to do when compiler finds an undeclared identifier?
    • eg: x = y + 1 and there is no entry for y in Symbol Table
    • Only complain once (Why?)
    • Create an entry for y in Symbol Table, to suppress future error messages
    • Assign the forged entry for y to have a type of “unknown”
    • “Unknown” is the type of all malformed expressions and is compatible with all other types
      • Can avoid redundant error messages (how?)

Jim Hogg - UW - CSE - P501

predefined things
“Predefined” Things
  • Many languages have some “predefined” items
    • classes, functions (eg: "maxint")
    • "standard library" or "prelude"
  • Write initialization code to inject predefined info in Symbol Table
  • Preferably, import a file including "standard prelude". Tradeoffs?
  • Rest of compiler doesn’t need to know the difference between “predefined” items and ones found in the user program

Jim Hogg - UW - CSE - P501

types
Types
  • Classical roles of types in programming languages
    • Compile-time error detection (find errors ASAP)
    • Improved expressiveness (eg: method & operator overloading)
    • Provide information to optimizer
    • Runtime safety
    • Eg: Haskell - if your program type-checks, it's like correct
    • Eg: Even FORTRAN had INTEGER and REAL - different bit layouts
    • Can we ensure programs are correct with enough testing?

Jim Hogg - UW - CSE - P501

terminology
Terminology

Static vs dynamic typing

• static: checking done prior to execution (eg, at compile-time)

• dynamic: checking during execution

Strong vsweak typing

• strong: guarantees no illegal operations performed

• weak: can’t make guarantees

Caveats:

  • Hybrids common
  • Inconsistent usage common
  • “untyped” or “typeless”

could mean dynamic or weak

Jim Hogg - UW - CSE - P501

type systems
Type Systems
  • Base Types
    • Fundamental, atomic types
    • Eg: int, double, char, bool
  • Compound or Constructed Types
    • Built up from other types (recursively)
    • Type-Constructors include:
      • arrays
      • records/structs/classes
      • pointers
      • enumerations
      • functions
      • modules (eg: ML)

Jim Hogg - UW - CSE - P501

how to represent types in a compiler
How to Represent Types in a Compiler?
  • Create a shallow class hierarchy. Eg:

abstract class Type {...}

class ClassType extends Type {...}

class BaseType extends Type {...}

    • Should not need too many of these

Jim Hogg - UW - CSE - P501

types vs asts
Types vs ASTs
  • Types are not AST nodes! (eg: IntType != INT != ILIT)
  • AST = abstract representation of source program (including source program type info)
  • Types = abstract representation of types for semantics checks, inference, etc.
    • Can include information not explicitly represented in the source code, or may describe types in ways more convenient for processing
  • Be sure you have a separate “type” class hierarchy in your compiler distinct from the AST

Jim Hogg - UW - CSE - P501

base types
Base Types
  • For each base type (int, bool, etc), create a single object to represent it
    • Symbol table entries and AST nodes for expressions refer to these to represent type info
    • Usually create at compiler startup
  • Useful to create a type void object to tag functions that return no value
  • Also useful to create a type unknown object for errors
    • void and unknown types reduce need for special case code
    • ie, pass these types around - no need to check everywhere

Jim Hogg - UW - CSE - P501

compound types
Compound Types
  • Basic idea: use appropriate “type constructor” object that refers to component types
    • Limited number – correspond directly to type constructors in the language (record/struct, class, array, function, …)
    • A compound type is represented as a graph

Jim Hogg - UW - CSE - P501

class types
Class Types
  • Type for: class Id { fields and methods }

class ClassType extends Type {

Type baseClassType; // ref to base class

ArrayList fields; // type info for fields

ArrayList methods;// type info for methods

}

    • (Note: may not want to do this literally depending on how class symbol tables are represented; ie, class Symbol Tables might be useful as the representation of the class type)

Jim Hogg - UW - CSE - P501

array types
Array Types
  • For regular Java this is simple: only possibility is # of dimensions and element type

class ArrayType extends Type {

intnDims; // eg: 1

Type elementType; // eg: IntType

}

Jim Hogg - UW - CSE - P501

array types for pascal c
Array Types for Pascal &c.
  • Pascal allows arrays to be indexed by any discrete type
    • array[indexType] of elementType where indexType might be an enum such as {Red, Green, Blue}; or [3..7]
  • Element type can be any other type, including an array (ie, 2-D array = 1-D array of 1-D arrays)

class GeneralArrayType extends Type {

Type indexType;

Type elementType;

}

Jim Hogg - UW - CSE - P501

methods functions
Methods/Functions
  • Type of a method is its result type plus an ordered list of parameter types

class MethodType extends Type {

Type resultType; // type or “void”

ArrayListparameterTypes;

}

  • Sometime called the method "signature"

Jim Hogg - UW - CSE - P501

type equivalence
Type Equivalence
  • For base types this is simple
    • Types are the same if they are identical
    • Use pointer comparison in the type checker (to singleton Type object)
    • Normally there are well defined rules for coercions between arithmetic types
      • Compiler inserts these automatically or when requested by programmer (casts) – often requires inserting cast/conversion AST nodes
    • Not mandatory: some languages provide no silent type coercion or promotion

Jim Hogg - UW - CSE - P501

type equivalence for compound types
Type Equivalence for Compound Types
  • Two basic strategies
    • Structural equivalence: two types are the same if they are the same kind of type and their component types are equivalent, recursively
    • eg: Complex and Point?
    • eg: Point (Cartesian) and Point (Polar)
    • Name equivalence: two types are the same only if they have the same name, even if their structures match
  • Different language design philosophies

Jim Hogg - UW - CSE - P501

type equivalence and inheritance
Type Equivalence and Inheritance
  • Suppose we have

class Base { … }

class Derived extends Base { … }

  • During execution, a variable of type Base may refer to an object of class Base or any of its sub-classes, such as Derived
    • Base b; // b has compile-time type Base
    • b = new Derived(); // b has runtime type Derived
    • Derived d;// d has compile-time type Derived
    • d = new Base();// d has runtime type Base - unsafe

Jim Hogg - UW - CSE - P501

notions of equivalence
Notions of Equivalence
  • Several relations on types to deal with:
    • “is the same as”
    • “is assignable to”
    • “is same or a subclass of”
    • “is convertible to”
  • Be sure to check for the right one(s)

Jim Hogg - UW - CSE - P501

compiler helper methods
Compiler Helper Methods
  • Create helper methods to decide type compatibility:
    • Types are identical
    • Type t1 is assignment-compatibile with t2
    • Parameter list is compatible with types of expressions in the call
  • Usual modularity reasons: isolates these decisions in one place and hides actual type representation from the rest of the compiler
  • Probably belongs in the same package as the type representation classes

Jim Hogg - UW - CSE - P501

implementing type checking for minijava
Implementing Type Checking for MiniJava
  • Create multiple visitors for the AST
  • First passe(s): gather information
    • Collect global type information for classes
    • Could do this in one pass, or might want to do one pass to collect class information, then a second one to collect per-class information about fields, methods
  • Next set of passes: go through method bodies to check types, and other semantic constraints

Jim Hogg - UW - CSE - P501

slide62
Next
  • x86 overview (as a target for simple compilers)
  • Runtime representation of classes, objects, data, and method stack frames
  • Assembly language code for higher-level language statements

Jim Hogg - UW - CSE - P501