Code generation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 70

Code Generation PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

Code Generation. Compiler Baojian Hua [email protected] Middle and Back End. translation. AST. IR1. translation. IR2. other IR and translation. asm. Back-end Structure. instruction selector. IR. Assem. register allocator. TempMap. instruction scheduler. Assem. Recap.

Download Presentation

Code Generation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Code Generation

Compiler

Baojian Hua

[email protected]


Middle and Back End

translation

AST

IR1

translation

IR2

other IR and translation

asm


Back-end Structure

instruction selector

IR

Assem

register allocator

TempMap

instruction scheduler

Assem


Recap

  • What about “CODE”?

CODE

DATA

Procedures

Global Static Variables

Global Dynamic Data

Control Flow

Local Variables

Temporaries

Statements

Parameter Passing

Data Access

Read-only Data


A Simpler Target ISA

  • To simplify the discussion, let’s start with a much simpler ISA---a stack machine

  • Stack machines once were very popular in the history

    • but not today, for its low speed

    • but we’d like to discuss it for:

      • generating code for stack machine is simpler

      • many (virtual) stack machines are in widely use today

        • Pascal P code

        • Java byte code

        • Postscript


Code Generation for

Stack Machines


Stack Machine

  • Stack-based

    • no registers

    • ALU operates the stack and the memory

    • stack for expression calculation and function call (also called operand stack on JVM)

Memory

Stack

ALU

the stack:

Control


Stack Machine ISA

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

A subset of the Java virtual machine language (JVML)!

stack operations

memory access

Memory

arithmetic

Stack

ALU

function call and return

Control


Frame and Stack

Each function comes with two

storages: frame and stack

  • frame: holding arguments, locals and control

  • stack: computation

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

stack

after:

Control

3


ISA Semantics: push

push NUM:

top++;

stack[top] = NUM;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

after:

3

Control


ISA Semantics: pop

pop x:

x = stack[top];

top--;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

3

after:

3

Control


ISA Semantics: unwind

unwind n:

top -= n;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

v

v

v

after:

Control


ISA Semantics: load

load x:

top++;

stack[top] = x;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

x

y

before:

Stack

ALU

after:

Control


ISA Semantics: store

store x:

x = stack[top];

top--;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

v

after:

Control


ISA Semantics: add

add:

temp = stack[top-1]

+stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

1

after:

6

Control


ISA Semantics: sub

sub:

temp = stack[top-1]

-stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

1

after:

4

Control


ISA Semantics: mult

sub:

temp = stack[top-1]

*stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

2

after:

10

Control


ISA Semantics: call

call f:

// create a new frame for f

// pop all arguments to f’s

// frame

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

frame:

frame for f:

x

y

m

n

before:

before(empty):

5

2

after:


ISA Semantics: ret

ret:

// pop callee’s value and

// push it onto the

// caller’s stack top

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

frame:

frame for f:

x

y

m

n

before:

before:

v

after:

after(empty):


Extended SLP

// Extending SLP with functions: (* is the Kleen

// closure)

prog -> func*

func -> id (x1, …, xn){ s }

s -> s; s

| x := e

| print (es)

| return e

e -> n | x | e+e | e-e | e*e | e/e | f(es)

es-> e, es | \eps


Sample Programs

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


Recursive Decedent Code Generation

// Invariant: expression’s value is on stack top

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = gen_e (e); “store x”

gen_s (print (es)) = gen_es (es); “call print”

gen_s (return e) = gen_e (e); “ret”

gen_e (n) = “push n”

gen_e (x) = “load x”

gen_e (e1+e2) = gen_e (e1); gen_e (e2); “add”

gen_e (…) // similar for -, *, /

gen_e (f(es)) = gen_es(es); “call f”

gen_es (e; es) = gen_e (e); gen_es (es)


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t := x+y;

return t;

}


Example

pc

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

n

z

operand stack(empty) :


Example

pc

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

n

z

operand stack:

10


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

n

z

operand stack:

10


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:

5


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:

5


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

5

z

operand stack:


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

5

z

operand stack:

10


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

pc

m

10

n

5

z

operand stack:

10

5


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

pc

operand stack:

10

5

frame for plus:

x

y

t

operand stack:


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

pc

operand stack:

frame for plus:

x

10

y

5

t

operand stack:

10


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

pc

x

10

y

5

t

operand stack:

10

5


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

pc

operand stack:

10

15

5


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

operand stack:

pc

15


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

15

operand stack:

15

pc


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

15

operand stack:

15

pc


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

pc

15

frame for plus:

x

10

y

5

t

15

operand stack:


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

15

operand stack:

15

pc

frame for plus:

x

10

y

5

t

15

operand stack:


Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

15

operand stack:

15

pc

frame for plus:

x

10

y

5

t

15

operand stack:


Run the Stack machine code

  • Run the code on a real stack machine

    • if one is lucky to buy one…

  • Write an interpreter (virtual machine)

    • just like the JVM

  • Mimic a stack machine on non-stack machines:

    • E.g., use the call stack on x86 as the operand stack and the function frame

    • Or we may create a customized software stack


Mimic stack machine on x86

// gen_s as before

gen_e (n) = “pushl $n”

gen_e (x) = “pushl x”

gen_e (e1+e2) = gen_e (e1)

gen_e (e2)

“addl 0(%esp), 4(%esp)”

“addl $4, %esp”

correct?


Mimic stack machine on x86

// gen_s as before

gen_e (n) = “pushl $n”

gen_e (x) = “pushl x”

gen_e (e1+e2) = gen_e (e1)

gen_e (e2)

“popl %edx”

“addl %edx, 0(%esp)”


Better code generation

  • Generating stack machine code for x86 reveals a serious defect:

    • the generated code may be too slow

    • this will be more severe on RISC

      • which does not operate memory directly, so there may be a lot of “load” and “store”

  • A better idea is to introduce some registers into the stack machine

    • and some more instructions


Stack Machine with one Register

  • Stack-based

    • but with one register: r

Memory

Stack

ALU

the stack:

r

Control


Revised Stack Machine ISA

// ISA semantics (sample)

add:

r = stack[top]+r;

top--;

// ISA syntax

v -> NUM | x | r

s -> push v

| pop v

| unwind n

| load v

| store v

| add

| sub

| mult

| div

| call f

| ret

| mov v, v

before:

2

1

after “add”:

3


Recursive Decedent Code Generation (revised)

// Invariant: expression value is in register “r”

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = gen_e (e); “mov r, x”

gen_s (print (es)) = gen_es (es); “call print”

gen_s (return e) = gen_e(e); “ret”

gen_e (n) = “mov n, r”

gen_e (x) = “mov x, r”

gen_e (e1+e2) = gen_e (e1)

“push r”

gen_e (e2)

“add”

gen_e (…) // similar for -, *, /

gen_e (s, e) = gen_s (s); gen_e(e)

gen_es (e; es) = gen_e (e); “push r”; gen_es (es)


Example

0: mov 10, r // <- main

1: mov r, m

2: mov 5, r

3: mov r, n

4: load m

5: load n

6: call plus

7: mov r, z

8: load z

9: call print

10: mov x, r // <- plus

11: push r

12: mov y, r

13: add

13: mov r, t

14: load t

15: ret

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


More registers?

  • Can we put all intermediate results in registers?

    • thus do not need a stack

    • for instance, if we have two extra registers: r1 and r2, is the following code generation scheme right?

      gen_e (e1+e2) = gen_e (e1)

      “mov r, r1”

      gen_e (e2)

      “mov r, r2”

      “add r1, r2, r”


Code Generation for

Register-based Machines


Register Machine

  • Register-based

    • a set of registers

      • some 16, typically 32

    • ALU operates registers

    • load/store memory

    • registers holding all local variables, arguments, and temporaries

Memory

Register

ALU

register file:

r1

rn

Control


Better code generator

  • The decedent recursive code generation is relatively old

    • efficient and easy to implement

    • you’ll do this in lab3

  • Most modern compilers generate code for some register machines (IRs)

  • Next, we discuss a widely-used IR: the 3-address code

    • a register-based IR


3-address-code

v -> NUM | id

s -> x = v1⊕v2 // arith

| x = v // move

| x[v1] = v2 // store

| x = y[v] // load

| x = f (v1, …, vn) // call

| Cjmp (v1, L1, L2) // conditional

| Jmp L // uncond. jump

| Label L // label

| Return v // return


Recursive Decedent Code Generation

// Invariant: expression’s value is on stack top

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = r = gen_e (e); “x = r”

gen_s (print (es)) = (r1, …, rn) = gen_es (es);

“print(r1, …, rn)”

gen_s (return e) = r = gen_e (e); “ret r”

gen_e (n) = “r = n”, r

gen_e (x) = “r = x”, r

gen_e (e1+e2) = r1 = gen_e (e1);

r2 = gen_e (e2);

“r3 = r1+r2”, r3

gen_e (…) // similar for -, *, /

gen_e (f(es)) = (r1, …, rn) = gen_es(es);

“f(r1, …, rn)”

gen_es (e; es) = gen_e (e); gen_es (es)


Example

0: r1 = 10 // <- main

1: n = r1

2: r2 = 5

3: n = r2

4: z = plus(m, n)

5: call print(z)

6: r3 = x // <- plus

7: r4 = y

8: r5 = r3+r4

9: t = r5

10: ret t

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


Tree pattern matching

  • Consider this statement:

    • z = x + y

movl x, t

movl y, s

addl s, t

movl t, z

=

t

z

+

However, this is not optimal at all!

t

s

x

y


Tree pattern matching

  • Consider this statement:

    • z = x + y

movl x, t

addl y, t

movl t, z

=

z

+

x

y


Or better

  • Consider this statement:

    • z = x + y

movl x, z

addl y, z

=

z

+

x

y


Best tiling?

  • In practice, many different tilings exist

  • We want a tiling with “minimal cost”:

    • usually the smallest code size

    • can also take account of cost of instructions, etc.

  • Optimum tiling

  • Optimal tiling


Optimal tilings

  • Optimal tiling is easy

    • a simple greedy algorithm

    • well understood algorithm is maximul munch

      • start at the root

      • use “biggest” match (in # of tree nodes)


Optimum tiling

  • Optimum tiling is hard

    • a dynamic programming problem

      • start from the leaves, bottom up

      • carefully calculate some cost


Maximal munch rules (sample)

But, one must take into account the machine constraints!

movl x, z

addl y, z

z = x + y

What about both y and z are in memory?

movl x, z

subl y, z

z = x - y

Solution: deciding memory layout before code generation!

movl x, z

mult y, z

z = x * y

Multiplication and division make special use of register.

movl x, z

divl y, z

z = x / y

Solution: treat these instructions in an ad-hoc way.


int f (int x, int y){

int a,b,c,d;

int t1, t2;

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), t1

movl 12(%ebp), t2

movl t1, a

addl t2, a

movl a, b

addl $4, b

movl b, %eax

imult $2

movl %eax, c

movl b, %eax

cltd

idivl $8

movl %eax, d

movl $0, %eax

leave

ret }

Example

Prolog

int f (int x, int y)

{

int a;

int b;

int c;

int d;

a = x + y;

b = a + 4;

c = b * 2;

d = b / 8;

return 0;

}

y: 12(%ebp)

x: 8(%ebp)

Positions for a, b, c, d can not be decided now.

Epilog


Register allocation

  • After instruction selection, there are still some variables

    • to put as many as possible of them into registers (speed!)

    • and extras in memory (spilling)

  • This requires liveness analysis

  • All these will be discussed later


int f (int x, int y){

int a,b,c,d;

int t1, t2;

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), t1

movl 12(%ebp), t2

movl t1, a

addl t2, a

movl a, b

addl $4, b

movl b, %eax

imult $2

movl %eax, c

movl b, %eax

cltd

idivl $8

movl %eax, d

movl $0, %eax

leave

ret }

RegisterAllocation

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

movl %ecx, %ecx

addl %eax, %ecx

movl %ecx, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %eax, %eax

movl %ecx, %eax

cltd

idivl $8

movl %eax, %eax

movl $0, %eax

leave

ret

Rewriting

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

movl %ecx, %ecx

addl %eax, %ecx

movl %ecx, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %eax, %eax

movl %ecx, %eax

cltd

idivl $8

movl %eax, %eax

movl $0, %eax

leave

ret

Peep-holeOptimization

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

addl %eax, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %ecx, %eax

cltd

idivl $8

movl $0, %eax

leave

ret

AfterOptimization

int f (int x, int y)

{

int a;

int b;

int c;

int d;

a = x + y;

b = a + 4;

c = b * 2;

d = b / 8;

return 0;

}


  • Login