Code generation
Download
1 / 70

Code Generation - PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on

Code Generation. Compiler Baojian Hua [email protected] Middle and Back End. translation. AST. IR1. translation. IR2. other IR and translation. asm. Back-end Structure. instruction selector. IR. Assem. register allocator. TempMap. instruction scheduler. Assem. Recap.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Code Generation' - gaenor


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Code generation

Code Generation

Compiler

Baojian Hua

[email protected]


Middle and back end
Middle and Back End

translation

AST

IR1

translation

IR2

other IR and translation

asm


Back end structure
Back-end Structure

instruction selector

IR

Assem

register allocator

TempMap

instruction scheduler

Assem


Recap
Recap

  • What about “CODE”?

CODE

DATA

Procedures

Global Static Variables

Global Dynamic Data

Control Flow

Local Variables

Temporaries

Statements

Parameter Passing

Data Access

Read-only Data


A simpler target isa
A Simpler Target ISA

  • To simplify the discussion, let’s start with a much simpler ISA---a stack machine

  • Stack machines once were very popular in the history

    • but not today, for its low speed

    • but we’d like to discuss it for:

      • generating code for stack machine is simpler

      • many (virtual) stack machines are in widely use today

        • Pascal P code

        • Java byte code

        • Postscript


Code Generation for

Stack Machines


Stack machine
Stack Machine

  • Stack-based

    • no registers

    • ALU operates the stack and the memory

    • stack for expression calculation and function call (also called operand stack on JVM)

Memory

Stack

ALU

the stack:

Control


Stack machine isa
Stack Machine ISA

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

A subset of the Java virtual machine language (JVML)!

stack operations

memory access

Memory

arithmetic

Stack

ALU

function call and return

Control


Frame and stack
Frame and Stack

Each function comes with two

storages: frame and stack

  • frame: holding arguments, locals and control

  • stack: computation

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

stack

after:

Control

3


Isa semantics push
ISA Semantics: push

push NUM:

top++;

stack[top] = NUM;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

after:

3

Control


Isa semantics pop
ISA Semantics: pop

pop x:

x = stack[top];

top--;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

3

after:

3

Control


Isa semantics unwind
ISA Semantics: unwind

unwind n:

top -= n;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

v

v

v

after:

Control


Isa semantics load
ISA Semantics: load

load x:

top++;

stack[top] = x;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

x

y

before:

Stack

ALU

after:

Control


Isa semantics store
ISA Semantics: store

store x:

x = stack[top];

top--;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

v

after:

Control


Isa semantics add
ISA Semantics: add

add:

temp = stack[top-1]

+stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

1

after:

6

Control


Isa semantics sub
ISA Semantics: sub

sub:

temp = stack[top-1]

-stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

1

after:

4

Control


Isa semantics mult
ISA Semantics: mult

sub:

temp = stack[top-1]

*stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

2

after:

10

Control


Isa semantics call
ISA Semantics: call

call f:

// create a new frame for f

// pop all arguments to f’s

// frame

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

frame:

frame for f:

x

y

m

n

before:

before(empty):

5

2

after:


Isa semantics ret
ISA Semantics: ret

ret:

// pop callee’s value and

// push it onto the

// caller’s stack top

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

frame:

frame for f:

x

y

m

n

before:

before:

v

after:

after(empty):


Extended slp
Extended SLP

// Extending SLP with functions: (* is the Kleen

// closure)

prog -> func*

func -> id (x1, …, xn){ s }

s -> s; s

| x := e

| print (es)

| return e

e -> n | x | e+e | e-e | e*e | e/e | f(es)

es-> e, es | \eps


Sample programs
Sample Programs

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


Recursive decedent code generation
Recursive Decedent Code Generation

// Invariant: expression’s value is on stack top

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = gen_e (e); “store x”

gen_s (print (es)) = gen_es (es); “call print”

gen_s (return e) = gen_e (e); “ret”

gen_e (n) = “push n”

gen_e (x) = “load x”

gen_e (e1+e2) = gen_e (e1); gen_e (e2); “add”

gen_e (…) // similar for -, *, /

gen_e (f(es)) = gen_es(es); “call f”

gen_es (e; es) = gen_e (e); gen_es (es)


Example
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t := x+y;

return t;

}


Example1
Example

pc

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

n

z

operand stack(empty) :


Example2
Example

pc

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

n

z

operand stack:

10


Example3
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

n

z

operand stack:

10


Example4
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:


Example5
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:

5


Example6
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:

5


Example7
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

5

z

operand stack:


Example8
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

5

z

operand stack:

10


Example9
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

pc

m

10

n

5

z

operand stack:

10

5


Example10
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

pc

operand stack:

10

5

frame for plus:

x

y

t

operand stack:


Example11
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

pc

operand stack:

frame for plus:

x

10

y

5

t

operand stack:

10


Example12
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

pc

x

10

y

5

t

operand stack:

10

5


Example13
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

pc

operand stack:

10

15

5


Example14
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

operand stack:

pc

15


Example15
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

15

operand stack:

15

pc


Example16
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

15

operand stack:

15

pc


Example17
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

pc

15

frame for plus:

x

10

y

5

t

15

operand stack:


Example18
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

15

operand stack:

15

pc

frame for plus:

x

10

y

5

t

15

operand stack:


Example19
Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

15

operand stack:

15

pc

frame for plus:

x

10

y

5

t

15

operand stack:


Run the stack machine code
Run the Stack machine code

  • Run the code on a real stack machine

    • if one is lucky to buy one…

  • Write an interpreter (virtual machine)

    • just like the JVM

  • Mimic a stack machine on non-stack machines:

    • E.g., use the call stack on x86 as the operand stack and the function frame

    • Or we may create a customized software stack


Mimic stack machine on x86
Mimic stack machine on x86

// gen_s as before

gen_e (n) = “pushl $n”

gen_e (x) = “pushl x”

gen_e (e1+e2) = gen_e (e1)

gen_e (e2)

“addl 0(%esp), 4(%esp)”

“addl $4, %esp”

correct?


Mimic stack machine on x861
Mimic stack machine on x86

// gen_s as before

gen_e (n) = “pushl $n”

gen_e (x) = “pushl x”

gen_e (e1+e2) = gen_e (e1)

gen_e (e2)

“popl %edx”

“addl %edx, 0(%esp)”


Better code generation
Better code generation

  • Generating stack machine code for x86 reveals a serious defect:

    • the generated code may be too slow

    • this will be more severe on RISC

      • which does not operate memory directly, so there may be a lot of “load” and “store”

  • A better idea is to introduce some registers into the stack machine

    • and some more instructions


Stack machine with one register
Stack Machine with one Register

  • Stack-based

    • but with one register: r

Memory

Stack

ALU

the stack:

r

Control


Revised stack machine isa
Revised Stack Machine ISA

// ISA semantics (sample)

add:

r = stack[top]+r;

top--;

// ISA syntax

v -> NUM | x | r

s -> push v

| pop v

| unwind n

| load v

| store v

| add

| sub

| mult

| div

| call f

| ret

| mov v, v

before:

2

1

after “add”:

3


Recursive decedent code generation revised
Recursive Decedent Code Generation (revised)

// Invariant: expression value is in register “r”

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = gen_e (e); “mov r, x”

gen_s (print (es)) = gen_es (es); “call print”

gen_s (return e) = gen_e(e); “ret”

gen_e (n) = “mov n, r”

gen_e (x) = “mov x, r”

gen_e (e1+e2) = gen_e (e1)

“push r”

gen_e (e2)

“add”

gen_e (…) // similar for -, *, /

gen_e (s, e) = gen_s (s); gen_e(e)

gen_es (e; es) = gen_e (e); “push r”; gen_es (es)


Example20
Example

0: mov 10, r // <- main

1: mov r, m

2: mov 5, r

3: mov r, n

4: load m

5: load n

6: call plus

7: mov r, z

8: load z

9: call print

10: mov x, r // <- plus

11: push r

12: mov y, r

13: add

13: mov r, t

14: load t

15: ret

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


More registers
More registers?

  • Can we put all intermediate results in registers?

    • thus do not need a stack

    • for instance, if we have two extra registers: r1 and r2, is the following code generation scheme right?

      gen_e (e1+e2) = gen_e (e1)

      “mov r, r1”

      gen_e (e2)

      “mov r, r2”

      “add r1, r2, r”


Code Generation for

Register-based Machines


Register machine
Register Machine

  • Register-based

    • a set of registers

      • some 16, typically 32

    • ALU operates registers

    • load/store memory

    • registers holding all local variables, arguments, and temporaries

Memory

Register

ALU

register file:

r1

rn

Control


Better code generator
Better code generator

  • The decedent recursive code generation is relatively old

    • efficient and easy to implement

    • you’ll do this in lab3

  • Most modern compilers generate code for some register machines (IRs)

  • Next, we discuss a widely-used IR: the 3-address code

    • a register-based IR


3 address code
3-address-code

v -> NUM | id

s -> x = v1⊕v2 // arith

| x = v // move

| x[v1] = v2 // store

| x = y[v] // load

| x = f (v1, …, vn) // call

| Cjmp (v1, L1, L2) // conditional

| Jmp L // uncond. jump

| Label L // label

| Return v // return


Recursive decedent code generation1
Recursive Decedent Code Generation

// Invariant: expression’s value is on stack top

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = r = gen_e (e); “x = r”

gen_s (print (es)) = (r1, …, rn) = gen_es (es);

“print(r1, …, rn)”

gen_s (return e) = r = gen_e (e); “ret r”

gen_e (n) = “r = n”, r

gen_e (x) = “r = x”, r

gen_e (e1+e2) = r1 = gen_e (e1);

r2 = gen_e (e2);

“r3 = r1+r2”, r3

gen_e (…) // similar for -, *, /

gen_e (f(es)) = (r1, …, rn) = gen_es(es);

“f(r1, …, rn)”

gen_es (e; es) = gen_e (e); gen_es (es)


Example21
Example

0: r1 = 10 // <- main

1: n = r1

2: r2 = 5

3: n = r2

4: z = plus(m, n)

5: call print(z)

6: r3 = x // <- plus

7: r4 = y

8: r5 = r3+r4

9: t = r5

10: ret t

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


Tree pattern matching
Tree pattern matching

  • Consider this statement:

    • z = x + y

movl x, t

movl y, s

addl s, t

movl t, z

=

t

z

+

However, this is not optimal at all!

t

s

x

y


Tree pattern matching1
Tree pattern matching

  • Consider this statement:

    • z = x + y

movl x, t

addl y, t

movl t, z

=

z

+

x

y


Or better
Or better

  • Consider this statement:

    • z = x + y

movl x, z

addl y, z

=

z

+

x

y


Best tiling
Best tiling?

  • In practice, many different tilings exist

  • We want a tiling with “minimal cost”:

    • usually the smallest code size

    • can also take account of cost of instructions, etc.

  • Optimum tiling

  • Optimal tiling


Optimal tilings
Optimal tilings

  • Optimal tiling is easy

    • a simple greedy algorithm

    • well understood algorithm is maximul munch

      • start at the root

      • use “biggest” match (in # of tree nodes)


Optimum tiling
Optimum tiling

  • Optimum tiling is hard

    • a dynamic programming problem

      • start from the leaves, bottom up

      • carefully calculate some cost


Maximal munch rules sample
Maximal munch rules (sample)

But, one must take into account the machine constraints!

movl x, z

addl y, z

z = x + y

What about both y and z are in memory?

movl x, z

subl y, z

z = x - y

Solution: deciding memory layout before code generation!

movl x, z

mult y, z

z = x * y

Multiplication and division make special use of register.

movl x, z

divl y, z

z = x / y

Solution: treat these instructions in an ad-hoc way.


Example22

int f (int x, int y){

int a,b,c,d;

int t1, t2;

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), t1

movl 12(%ebp), t2

movl t1, a

addl t2, a

movl a, b

addl $4, b

movl b, %eax

imult $2

movl %eax, c

movl b, %eax

cltd

idivl $8

movl %eax, d

movl $0, %eax

leave

ret }

Example

Prolog

int f (int x, int y)

{

int a;

int b;

int c;

int d;

a = x + y;

b = a + 4;

c = b * 2;

d = b / 8;

return 0;

}

y: 12(%ebp)

x: 8(%ebp)

Positions for a, b, c, d can not be decided now.

Epilog


Register allocation
Register allocation

  • After instruction selection, there are still some variables

    • to put as many as possible of them into registers (speed!)

    • and extras in memory (spilling)

  • This requires liveness analysis

  • All these will be discussed later


Register allocation1

int f (int x, int y){

int a,b,c,d;

int t1, t2;

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), t1

movl 12(%ebp), t2

movl t1, a

addl t2, a

movl a, b

addl $4, b

movl b, %eax

imult $2

movl %eax, c

movl b, %eax

cltd

idivl $8

movl %eax, d

movl $0, %eax

leave

ret }

RegisterAllocation

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


Rewriting

.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

movl %ecx, %ecx

addl %eax, %ecx

movl %ecx, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %eax, %eax

movl %ecx, %eax

cltd

idivl $8

movl %eax, %eax

movl $0, %eax

leave

ret

Rewriting

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


Peep hole optimization

.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

movl %ecx, %ecx

addl %eax, %ecx

movl %ecx, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %eax, %eax

movl %ecx, %eax

cltd

idivl $8

movl %eax, %eax

movl $0, %eax

leave

ret

Peep-holeOptimization

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


After optimization

.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

addl %eax, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %ecx, %eax

cltd

idivl $8

movl $0, %eax

leave

ret

AfterOptimization

int f (int x, int y)

{

int a;

int b;

int c;

int d;

a = x + y;

b = a + 4;

c = b * 2;

d = b / 8;

return 0;

}


ad