Code generation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 70

Code Generation PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Code Generation. Compiler Baojian Hua [email protected] Middle and Back End. translation. AST. IR1. translation. IR2. other IR and translation. asm. Back-end Structure. instruction selector. IR. Assem. register allocator. TempMap. instruction scheduler. Assem. Recap.

Download Presentation

Code Generation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Code generation

Code Generation

Compiler

Baojian Hua

[email protected]


Middle and back end

Middle and Back End

translation

AST

IR1

translation

IR2

other IR and translation

asm


Back end structure

Back-end Structure

instruction selector

IR

Assem

register allocator

TempMap

instruction scheduler

Assem


Recap

Recap

  • What about “CODE”?

CODE

DATA

Procedures

Global Static Variables

Global Dynamic Data

Control Flow

Local Variables

Temporaries

Statements

Parameter Passing

Data Access

Read-only Data


A simpler target isa

A Simpler Target ISA

  • To simplify the discussion, let’s start with a much simpler ISA---a stack machine

  • Stack machines once were very popular in the history

    • but not today, for its low speed

    • but we’d like to discuss it for:

      • generating code for stack machine is simpler

      • many (virtual) stack machines are in widely use today

        • Pascal P code

        • Java byte code

        • Postscript


Code generation

Code Generation for

Stack Machines


Stack machine

Stack Machine

  • Stack-based

    • no registers

    • ALU operates the stack and the memory

    • stack for expression calculation and function call (also called operand stack on JVM)

Memory

Stack

ALU

the stack:

Control


Stack machine isa

Stack Machine ISA

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

A subset of the Java virtual machine language (JVML)!

stack operations

memory access

Memory

arithmetic

Stack

ALU

function call and return

Control


Frame and stack

Frame and Stack

Each function comes with two

storages: frame and stack

  • frame: holding arguments, locals and control

  • stack: computation

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

stack

after:

Control

3


Isa semantics push

ISA Semantics: push

push NUM:

top++;

stack[top] = NUM;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

after:

3

Control


Isa semantics pop

ISA Semantics: pop

pop x:

x = stack[top];

top--;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

3

after:

3

Control


Isa semantics unwind

ISA Semantics: unwind

unwind n:

top -= n;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

v

v

v

after:

Control


Isa semantics load

ISA Semantics: load

load x:

top++;

stack[top] = x;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

x

y

before:

Stack

ALU

after:

Control


Isa semantics store

ISA Semantics: store

store x:

x = stack[top];

top--;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

v

after:

Control


Isa semantics add

ISA Semantics: add

add:

temp = stack[top-1]

+stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

1

after:

6

Control


Isa semantics sub

ISA Semantics: sub

sub:

temp = stack[top-1]

-stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

1

after:

4

Control


Isa semantics mult

ISA Semantics: mult

sub:

temp = stack[top-1]

*stack[top];

top -= 2;

push temp;

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

Memory

frame:

x

y

before:

Stack

ALU

5

2

after:

10

Control


Isa semantics call

ISA Semantics: call

call f:

// create a new frame for f

// pop all arguments to f’s

// frame

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

frame:

frame for f:

x

y

m

n

before:

before(empty):

5

2

after:


Isa semantics ret

ISA Semantics: ret

ret:

// pop callee’s value and

// push it onto the

// caller’s stack top

// ISA syntax

s -> push NUM

| pop x

| unwind n

| load x

| store x

| add

| sub

| mult

| div

| call f

| ret

frame:

frame for f:

x

y

m

n

before:

before:

v

after:

after(empty):


Extended slp

Extended SLP

// Extending SLP with functions: (* is the Kleen

// closure)

prog -> func*

func -> id (x1, …, xn){ s }

s -> s; s

| x := e

| print (es)

| return e

e -> n | x | e+e | e-e | e*e | e/e | f(es)

es-> e, es | \eps


Sample programs

Sample Programs

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


Recursive decedent code generation

Recursive Decedent Code Generation

// Invariant: expression’s value is on stack top

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = gen_e (e); “store x”

gen_s (print (es)) = gen_es (es); “call print”

gen_s (return e) = gen_e (e); “ret”

gen_e (n) = “push n”

gen_e (x) = “load x”

gen_e (e1+e2) = gen_e (e1); gen_e (e2); “add”

gen_e (…) // similar for -, *, /

gen_e (f(es)) = gen_es(es); “call f”

gen_es (e; es) = gen_e (e); gen_es (es)


Example

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t := x+y;

return t;

}


Example1

Example

pc

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

n

z

operand stack(empty) :


Example2

Example

pc

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

n

z

operand stack:

10


Example3

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

n

z

operand stack:

10


Example4

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:


Example5

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:

5


Example6

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

z

operand stack:

5


Example7

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

5

z

operand stack:


Example8

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

pc

frame for main:

m

10

n

5

z

operand stack:

10


Example9

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

pc

m

10

n

5

z

operand stack:

10

5


Example10

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

pc

operand stack:

10

5

frame for plus:

x

y

t

operand stack:


Example11

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

pc

operand stack:

frame for plus:

x

10

y

5

t

operand stack:

10


Example12

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

pc

x

10

y

5

t

operand stack:

10

5


Example13

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

pc

operand stack:

10

15

5


Example14

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

operand stack:

pc

15


Example15

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

15

operand stack:

15

pc


Example16

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

frame for plus:

x

10

y

5

t

15

operand stack:

15

pc


Example17

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

operand stack:

pc

15

frame for plus:

x

10

y

5

t

15

operand stack:


Example18

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

15

operand stack:

15

pc

frame for plus:

x

10

y

5

t

15

operand stack:


Example19

Example

0: push 10 // <- main

1: store m

2: push 5

3: store n

4: load m

5: load n

6: call plus

7: store z

8: load z

9: call print

10: load x // <- plus

11: load y

12: add

13: store t

14: load t

15: ret

frame for main:

m

10

n

5

z

15

operand stack:

15

pc

frame for plus:

x

10

y

5

t

15

operand stack:


Run the stack machine code

Run the Stack machine code

  • Run the code on a real stack machine

    • if one is lucky to buy one…

  • Write an interpreter (virtual machine)

    • just like the JVM

  • Mimic a stack machine on non-stack machines:

    • E.g., use the call stack on x86 as the operand stack and the function frame

    • Or we may create a customized software stack


Mimic stack machine on x86

Mimic stack machine on x86

// gen_s as before

gen_e (n) = “pushl $n”

gen_e (x) = “pushl x”

gen_e (e1+e2) = gen_e (e1)

gen_e (e2)

“addl 0(%esp), 4(%esp)”

“addl $4, %esp”

correct?


Mimic stack machine on x861

Mimic stack machine on x86

// gen_s as before

gen_e (n) = “pushl $n”

gen_e (x) = “pushl x”

gen_e (e1+e2) = gen_e (e1)

gen_e (e2)

“popl %edx”

“addl %edx, 0(%esp)”


Better code generation

Better code generation

  • Generating stack machine code for x86 reveals a serious defect:

    • the generated code may be too slow

    • this will be more severe on RISC

      • which does not operate memory directly, so there may be a lot of “load” and “store”

  • A better idea is to introduce some registers into the stack machine

    • and some more instructions


Stack machine with one register

Stack Machine with one Register

  • Stack-based

    • but with one register: r

Memory

Stack

ALU

the stack:

r

Control


Revised stack machine isa

Revised Stack Machine ISA

// ISA semantics (sample)

add:

r = stack[top]+r;

top--;

// ISA syntax

v -> NUM | x | r

s -> push v

| pop v

| unwind n

| load v

| store v

| add

| sub

| mult

| div

| call f

| ret

| mov v, v

before:

2

1

after “add”:

3


Recursive decedent code generation revised

Recursive Decedent Code Generation (revised)

// Invariant: expression value is in register “r”

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = gen_e (e); “mov r, x”

gen_s (print (es)) = gen_es (es); “call print”

gen_s (return e) = gen_e(e); “ret”

gen_e (n) = “mov n, r”

gen_e (x) = “mov x, r”

gen_e (e1+e2) = gen_e (e1)

“push r”

gen_e (e2)

“add”

gen_e (…) // similar for -, *, /

gen_e (s, e) = gen_s (s); gen_e(e)

gen_es (e; es) = gen_e (e); “push r”; gen_es (es)


Example20

Example

0: mov 10, r // <- main

1: mov r, m

2: mov 5, r

3: mov r, n

4: load m

5: load n

6: call plus

7: mov r, z

8: load z

9: call print

10: mov x, r // <- plus

11: push r

12: mov y, r

13: add

13: mov r, t

14: load t

15: ret

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


More registers

More registers?

  • Can we put all intermediate results in registers?

    • thus do not need a stack

    • for instance, if we have two extra registers: r1 and r2, is the following code generation scheme right?

      gen_e (e1+e2) = gen_e (e1)

      “mov r, r1”

      gen_e (e2)

      “mov r, r2”

      “add r1, r2, r”


Code generation

Code Generation for

Register-based Machines


Register machine

Register Machine

  • Register-based

    • a set of registers

      • some 16, typically 32

    • ALU operates registers

    • load/store memory

    • registers holding all local variables, arguments, and temporaries

Memory

Register

ALU

register file:

r1

rn

Control


Better code generator

Better code generator

  • The decedent recursive code generation is relatively old

    • efficient and easy to implement

    • you’ll do this in lab3

  • Most modern compilers generate code for some register machines (IRs)

  • Next, we discuss a widely-used IR: the 3-address code

    • a register-based IR


3 address code

3-address-code

v -> NUM | id

s -> x = v1⊕v2 // arith

| x = v // move

| x[v1] = v2 // store

| x = y[v] // load

| x = f (v1, …, vn) // call

| Cjmp (v1, L1, L2) // conditional

| Jmp L // uncond. jump

| Label L // label

| Return v // return


Recursive decedent code generation1

Recursive Decedent Code Generation

// Invariant: expression’s value is on stack top

gen_s (s1; s2) = gen_s (s1); gen_s (s2);

gen_s (x := e) = r = gen_e (e); “x = r”

gen_s (print (es)) = (r1, …, rn) = gen_es (es);

“print(r1, …, rn)”

gen_s (return e) = r = gen_e (e); “ret r”

gen_e (n) = “r = n”, r

gen_e (x) = “r = x”, r

gen_e (e1+e2) = r1 = gen_e (e1);

r2 = gen_e (e2);

“r3 = r1+r2”, r3

gen_e (…) // similar for -, *, /

gen_e (f(es)) = (r1, …, rn) = gen_es(es);

“f(r1, …, rn)”

gen_es (e; es) = gen_e (e); gen_es (es)


Example21

Example

0: r1 = 10 // <- main

1: n = r1

2: r2 = 5

3: n = r2

4: z = plus(m, n)

5: call print(z)

6: r3 = x // <- plus

7: r4 = y

8: r5 = r3+r4

9: t = r5

10: ret t

main (){

m := 10;

n := 5;

z := plus (m, n);

print (z);

}

plus (x, y){

t = x+y;

return t;

}


Tree pattern matching

Tree pattern matching

  • Consider this statement:

    • z = x + y

movl x, t

movl y, s

addl s, t

movl t, z

=

t

z

+

However, this is not optimal at all!

t

s

x

y


Tree pattern matching1

Tree pattern matching

  • Consider this statement:

    • z = x + y

movl x, t

addl y, t

movl t, z

=

z

+

x

y


Or better

Or better

  • Consider this statement:

    • z = x + y

movl x, z

addl y, z

=

z

+

x

y


Best tiling

Best tiling?

  • In practice, many different tilings exist

  • We want a tiling with “minimal cost”:

    • usually the smallest code size

    • can also take account of cost of instructions, etc.

  • Optimum tiling

  • Optimal tiling


Optimal tilings

Optimal tilings

  • Optimal tiling is easy

    • a simple greedy algorithm

    • well understood algorithm is maximul munch

      • start at the root

      • use “biggest” match (in # of tree nodes)


Optimum tiling

Optimum tiling

  • Optimum tiling is hard

    • a dynamic programming problem

      • start from the leaves, bottom up

      • carefully calculate some cost


Maximal munch rules sample

Maximal munch rules (sample)

But, one must take into account the machine constraints!

movl x, z

addl y, z

z = x + y

What about both y and z are in memory?

movl x, z

subl y, z

z = x - y

Solution: deciding memory layout before code generation!

movl x, z

mult y, z

z = x * y

Multiplication and division make special use of register.

movl x, z

divl y, z

z = x / y

Solution: treat these instructions in an ad-hoc way.


Example22

int f (int x, int y){

int a,b,c,d;

int t1, t2;

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), t1

movl 12(%ebp), t2

movl t1, a

addl t2, a

movl a, b

addl $4, b

movl b, %eax

imult $2

movl %eax, c

movl b, %eax

cltd

idivl $8

movl %eax, d

movl $0, %eax

leave

ret }

Example

Prolog

int f (int x, int y)

{

int a;

int b;

int c;

int d;

a = x + y;

b = a + 4;

c = b * 2;

d = b / 8;

return 0;

}

y: 12(%ebp)

x: 8(%ebp)

Positions for a, b, c, d can not be decided now.

Epilog


Register allocation

Register allocation

  • After instruction selection, there are still some variables

    • to put as many as possible of them into registers (speed!)

    • and extras in memory (spilling)

  • This requires liveness analysis

  • All these will be discussed later


Register allocation1

int f (int x, int y){

int a,b,c,d;

int t1, t2;

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), t1

movl 12(%ebp), t2

movl t1, a

addl t2, a

movl a, b

addl $4, b

movl b, %eax

imult $2

movl %eax, c

movl b, %eax

cltd

idivl $8

movl %eax, d

movl $0, %eax

leave

ret }

RegisterAllocation

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


Rewriting

.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

movl %ecx, %ecx

addl %eax, %ecx

movl %ecx, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %eax, %eax

movl %ecx, %eax

cltd

idivl $8

movl %eax, %eax

movl $0, %eax

leave

ret

Rewriting

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


Peep hole optimization

.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

movl %ecx, %ecx

addl %eax, %ecx

movl %ecx, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %eax, %eax

movl %ecx, %eax

cltd

idivl $8

movl %eax, %eax

movl $0, %eax

leave

ret

Peep-holeOptimization

Register allocation

determines that:

a => ecx

b => ecx

c => eax

d => eax

t1 => ecx

t2 => eax


After optimization

.globl f

f:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %ecx

movl 12(%ebp), %eax

addl %eax, %ecx

addl $4, %ecx

movl %ecx, %eax

imult $2

movl %ecx, %eax

cltd

idivl $8

movl $0, %eax

leave

ret

AfterOptimization

int f (int x, int y)

{

int a;

int b;

int c;

int d;

a = x + y;

b = a + 4;

c = b * 2;

d = b / 8;

return 0;

}


  • Login