Lecture #7, April 24, 2007

Lecture #7, April 24, 2007 • More about IR1 • Library Functions • Canonicalization • OO runtime issues • Object size • initialization

Assignments • Project #1 due Wednesday, May 3, 2007 • Recall Midterm Exam on Tuesday May 1, 2007. In class, 1.5 hours, two days before Project 1 is due.

IR1 simplifications • As we move into the backend of the compiler we are making some simplifications. • We have only integer and Boolean values (no more floating point values) • All values require 32 bits to store (including booleans) • Values and pointers (addresses) take up the same amount of space (32 bits). • Every value takes up exactly wdSize bytes (where wdSize = 4)

Semantics of Exp datatype EXP = BINOP of ProgramTypes.BINOP * EXP * EXP | RELOP of ProgramTypes.RELOP * EXP * EXP | CALL of EXP * EXP list | MEM of EXP | NAME of string (* method names *) | TEMP of int (* registers *) | PARAM of int (* method parameters *) | MEMBER of EXP * int (* instance variables *) | VAR of int (* local vars of methods *) | CONST of string | STRING of string | ESEQ of STMT list * EXP Expressions represent values, But some values (mostly new array and new object) require actions to complete. ESEQ allows us to embed actions in expressions. We will discuss some of the highlights next.

BINOP and RELOP • These are straightforward translations of their ProgramTypes.sml counter parts. • Binops translate directly. • Relop LT, GT etc, translate as if their were GT etc. operators just like ADD, TIMES etc. • Binops AND and OR generally appear in the tests of the statements While and If. We use short circuit translations to translate these. • If And or OR appears in an expression (rather than a statement) we can still use shortcircuit evaluation by using the ESEQ expression, using a local temp and generating statements (inside the ESEQ) to move either true or false into the local temp. (this is already done in the template code handed out).

Library functions. • Several missing operations can be translated into library functions. • A library function is a function supplied by the runtime environment. Possible library functions include, Boolean negation, and unary minus, malloc, coerce, etc. • A library function translates to a call. CALL (NAME “unary_minus”) [VAR 1] CALL (NAME “negate”) [VAR 3] • We use the NAME expression to name library functions as well as the functions we generate to implement methods. • We will develop other library functions as we go along.

MEM • MEM has no direct counter-part in ProgramTypes.sml • Its meaning is to fetch the contents of a memory location. • Its value is the value of that memory location. • It is always a 32 bit value. • Several other expression constructors have as their value memory locations. These constructors are appropriate arguments to MEM. • They include TEMP, PARAM, MEMBER, and VAR

Addresses • Variables, methods, members, and parameters all have there values stored in memory locations. • Most of these addresses are fixed offsets from some known address. I.e. the current object pointer or the activation record pointer. • The runtime system will define these known addresses.

(PARAM n) • This means the address of the nth parameter. • There will be some fixed location for parameters • We will need to add the correct offset for the nth parameter to this address. • Under the assumption that all values take up wdSize bytes, the offset = n * wdSize , but we leave this abstract at this stage.

(Var n) • This means the address of the nth local variable of the current method. • There will be some fixed location for local variables. • We will need to add the correct offset for the nth local to this address. • Under the assumption that all values take up wdSize bytes, the offset = n * wdSize , but we again leave this abstract at this stage.

(MEMBER(X,n)) • This means the value of the nth member (instance variable) of the object stored at address “X” • We will need to add the correct offset for the nth local to this address. Note that X is itself an address. MEMBER(x,n) = MEM(x + wdSize * n) under the assumption that all instance variables take up wdSize bytes.

this.x • Recall that instance variables without an object prefix, are really: this.x • What is the address of this? • This refers to the current object. The object which includes the method being executed. • x.f(1,3) The object of a method call is an implicit parameter 0th parameter.

Printing • Printing is handled library functions. • We will need 1 library function for each kind of object we can print. • In general we’d need a Basic type tag in the ProgramTypes PrintE constructor to support this. • Lets assume we print only Integer values. Then we need only two library functions. • One for printing literal strings (PrintT) called (NAME “prStr”) , and one for (PrintE) called (NAME “prInt”)

Translating Methods • Each method is translated into a Func • To translate you need to • Create a proper name. Classname_methodname • You may need to track the current class so the classname is available • Translate the methods variable declarations into a (possibly empty) STMT list • Translate the methods body into a STMT list • Merge the two STMT list. Put the variable one first. • As you do this you will need to prepare the correct environment that tracks the Vkind of variables. • Return a FUNC object. Be sure and get the (ProgramTypes.Type list) right in the Func node as these will needed in the second phase.

Translating Methods fun pass1M className env (MetDecl(loc,rng,name,params,vars,body)) = let fun paramTypes (Formal(typ,nm)) = typ fun paramEnv count [] = [] | paramEnv count ((Formal(typ,nm))::xs) = (nm,Vparam count)::(paramEnv (count+1) xs) fun varTypes (VarDecl(loc,typ,nm,init)) = typ fun varEnv count [] = [] | varEnv count ((VarDecl(loc,typ,nm,init))::xs) = (nm,Vlocal count)::(varEnv (count+1) xs) val initEnv = (paramEnv 1 params) @ env fun initIR count [] = [] | initIR count (VarDecl(loc,typ,nm,SOME init) :: vs) = (MOVE(VAR count,pass1E initEnv init)):: initIR (count+1) vs | initIR count (VarDecl(loc,typ,nm,NONE)::xs) = initIR (count+1) xs val varIR = initIR 1 vars val bodyEnv = (varEnv 1 vars) @ initEnv val bodyIR = pass1S bodyEnv (Block body) in (FUNC(className^"_"^name ,map paramTypes params ,map varTypes vars ,varIR @ bodyIR)) end

Canonicalization • Usually we like to think of expressions as being side effect free. • Because of ESEQ, this is clearly not true. • We need to evaluate expressions in a canonical order in order to make sure we always get effects in the same order. • Consider: x.f(new person, new int [3]) • Both arguments translate to side effecting code. Which effects should happen first. Maybe in this case it doesn’t matter, but we should have a fixed evaluation order.

Consider f(g(5),7) • Translation into X86 may require the use of specific registers. • When calls are nested inside calls, if we’re not carefull the register usage can get mixed up.

General fix part 1 • Always use a new temporary register to name the value of a method call. • Load this value immediately after the method returns. • Use only this new register name in subsequent code. • CALL(NAME “f”,[CONST “1”]) ESEQ([MOVE (TEMP 100 ,CALL(NAME “f”,[CONST “1”]) )] ,TEMP 100)

General fix part 2 • Translate every expression into a pair • The first part of the pair is the statements in the expression, the second part of the pair is a pure expression (with no embedded ESEQ) 2 + f (1) + g(3) 2 + ESEQ([temp100 := f(1)],temp100) + ESEQ([temp101 := g(3)],temp101) ( [temp100 := f(1), temp101 := g(3)] , 2+temp100+temp101)

Canonicalization of Statements • Since statements can have expressions we need to canonicalize statements as well. MOVE(f(3), g(5)) [t1 := f(3), t2 := g(5), MOVE(t1,t2) ] The general case is to translate any statement into a list of statements. Statements lifted out of embedded expressions are incorporated into the resulting list of ststements.

ML code fun canonicalE x = case x of BINOP(m,a,b) => let val (sa,ea) = canonicalE a val (sb,eb) = canonicalE b in (sa@sb,BINOP(m,ea,eb)) end | CALL(f,xs) => let val temp = newTemp() val (sf,ef) = canonicalE f val (xsStmt,xsL) = canonicalL xs in (sf @ xsStmt @ [ MOVE(temp,CALL(ef,xsL)) ],temp) end see next slide for canonicalL

canonicalL • How do we canonicalize a list of expressions? and canonicalL [] = ([],[]) | canonicalL (x :: xs) = let val (xStmt,xL) = canonicalE x val (xsStmt,xsL) = canonicalL xs in (xStmt @ xsStmt, xL :: xsL) end

Statements and canonicalS x = case x of MOVE(a,b) => let val (sa,ea) = canonicalE a val (sb,eb) = canonicalE b in sa @ sb @ [MOVE(ea,eb)] end | STMTlist xs => List.concat (map canonicalS xs) List.concat has type (‘a list) list -> ‘a list

Fair Warning • Project #2, to be assigned on May 3rd, includes, in part, the completion of cannonicalization. • It will also include the finalization of offsets • And it will include some simple optimization.

Run-time issues • So far, in IR1, we have glossed over all the important run-time issues of an OO language. • thanks to Jenke Li for these notes • Classes and objects • storage allocation • static class variables • dynamic class variables • Method Invocations • static methods • static binding methods • dynamic binding methods • mini Java’s method invocation • Others • local variables and parameters • non-local (class) variables • Activation record size • this pointer

Storage for Class objects • Observations • Static class variables are established per class. They should be allocated once in a single static place • Dynamic class variables are cloned once for every new object. They are allocated space inside the object. • General Strategies • A class descriptor for each class • pointer to parent class • pointers to (local) methods • storage for static variables • An object record for each class • pointer to class descriptor • storage for local class (dynamic) variables • storage for inherited variables.

Object Record Layout • Objects contain space not only for variables that belong to the this class, but also for all ancestor classes. • How should we layout variables so that the offset of all of them can be determined statically by the compiler? • For single inheritance, we can use the prefixing method.

Prefix Method • When a class B extends a class A • those variables of B that are inherited from A are laid out in a record implementing B, in the same order they appear in the record implementing A. • The compiler can assign a fixed offset for every variable in the object record. • The offset will be the same for a class and for all its subclasses. • Compiled methods can access variables by their offset, and not their name.

An Example class A { int i=1, j=2; } class B extends A {int m =3, n = 4; } class C extends A {int k = 5; } class D extends C { int l = 6; } class Test { A a = new A; B b = new B; C c = new C; D d = newd D; … } A’s rec B’s rec C’s rec D’s record i i i i j j j j m k k n l

Deciding Object size • To decide an object’s size • inherited class information must be known class A { int i=1, j=2; } 2*wdSize class B extends A {int m =3, n = 4; } A’s size + 2*wdSize class C extends A {int k = 5; } A’s size + 1*wdSize class D extends C { int l = 6; } C’s size + 1*wdSize class Test { A a = new A; // allocate 2*wdSize B a = new B; // allocate 4*wdSize C c = new C; // allocate 3*wdSize D d = new D; … } // allocate 4*wdSize

What if class declarations appear out of order? class D extends C { int l = 6; } class C extends A {int k = 5; } class A { int i=1, j=2; } class B extends A {int m =3, n = 4; } Solution Perform a topological sort on class decls based upon inheritance relationship, then collect the size information.

Deciding Class Variable Offsets • Once object sizes are known (computed), variable offsets can be computed easily. • Rule. • The offset of a subclass’s first instance variable is the parent classes’ object size. • Example class A { int i=1 // 0 , j=2; // 1* wdSize } class B extends A {int m =3 // 2*wdSize , n = 4; // 3*wdSize } class C extends A {int k = 5; // 2*wdSize } class D extends C { int l = 6; // 3*wdSize }

Class Variable Initialization • Class variable must always be initialized • either by the compiler • or the user • Initialization happens at object creation time • User-provided initialization code is in the class declaration. • Solution – Collect initialization info while processing class decls. Propogate the initialization expression downwards. Use this info when generating new expression code.

Example class A { int i=1 // 0 1 , j=2; // 1* wdSize 2 } class B extends A {int m=3 // 2*wdSize 3 , n = 4; // 3*wdSize 4 } class C extends A {int k = 5; // 2*wdSize 5 } class D extends C { int l = 6; // 3*wdSize 6 }

new A New Objects • Object size • Variable Offsets • Variable initialization • canonicalize the initialization statements (stored in symbol table) to get a list of statements and expressions. (initS,initExps) • ESEQ( initS @ [t1 := #vars * wdsize ,t2 := malloc(t1) ,t3 := 0 ,t2[t3] := (get 0 initExps) ,t3 := t3 + wdSize ,t2[t3] := (get 1 initExps) . . .], t2)

Accessing an Objects instance variables • O.x • Obtain objects address by translating O • Obtain variables offset • add the two together. • Recall that any value, parameter, etc which is an object, can be treated as an address.

Next time • Next time we will make more concrete assumptions about translating mini-Java • We will begin to define a “semantics” for IR1 by making an interpreter for it.

Lecture #7, April 24, 2007

Lecture #7, April 24, 2007

Presentation Transcript

LECTURE

Lecture 25 Lecture 26

Lecture

Lecture VIII Lecture IX

Lecture 6 Lecture 7

Lecture 10 Lecture 10 Lecture 11 Lecture 11 Lecture 11 Lecture 11

Lecture: Density (Mikey’s Lecture)

Lecture S1: Sample Lecture