Dynamic binary optimization
Download
1 / 31

Dynamic Binary Optimization - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

Dynamic Binary Optimization. Presenter Kim Jin Chul. Contents. 1. Overview of Applying Optimization on VMs. 2. Dynamic Program Behavior. 3. Profiling. 4. Optimizing Translation Blocks. addi r16, r4, 4 ; add 4 to %eax lwzx r17, r2, r16 ; load operand from memory

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Dynamic Binary Optimization ' - nydia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dynamic binary optimization

Dynamic Binary Optimization

Presenter

Kim Jin Chul


Contents
Contents

1

Overview of Applying Optimization on VMs

2

Dynamic Program Behavior

3

Profiling

4

Optimizing Translation Blocks


Classical optimizations

addi r16, r4, 4 ; add 4 to %eax

lwzx r17, r2, r16 ; load operand from memory

add r7, r17, r7 ; perform add of %edx

addi r16, r4, 4 ; add 4 to %eax

stwx r7, r2, r16 ; store %edx value into memory

Classical Optimizations

addl %edx, 4(%eax)

movl 4(%eax), %edx

Translation from IA-32 to PowerPC code.

Adopt a Common Subexpression Elimination

addi r16, r4, 4 ; add 4 to %eax

lwzx r17, r2, r16 ; load operand from memory

add r7, r17, r7 ; perform add of %edx

stwx r7, r2, r16 ; store %edx value into memory


Optimization based on profiling
Optimization Based on Profiling

Basic Block A ...

...

R3 ← ...

R7 ← ...

R1 ← R2 + R3

Br L1 if R3 == 0

Basic Block A ...

...

R3 ← ...

R7 ← ...

Br L1 if R3 == 0

Basic Block A ...

...

R3 ← ...

R7 ← ...

Br L1 if R3 == 0

Compensation code

R1 ← R2 + R3

Basic Block B

...

R6 ← R1 + R6

...

...

Basic Block B

...

R6 ← R1 + R6

...

...

Basic Block B

...

R6 ← R1 + R6

...

...

use

Basic Block C

L1: R1 ← 0

...

...

Basic Block C

L1: R1 ← 0

...

...

Basic Block C

L1: R1 ← 0

...

...

def


Optimization based on profiling1

Compensation code

R1 ← R2 + R3

Basic Block B

L2:...

R6 ← R1 + R6

...

...

Optimization Based on Profiling

Basic Block A ...

...

R3 ← ...

R7 ← ...

R1 ← R2 + R3

Br L1 if R3 == 0

Superblock ...

...

R3 ← ...

R7 ← ...

Br L2 if R3 != 0

R1 ← 0

...

...

Basic Block B

...

R6 ← R1 + R6

...

...

Basic Block C

L1: R1 ← 0

...

...


A staged optimization system

Stages: Interpret Basic translation Optmized block Highly optimized blocks

Fast startup Very slow startup

Slow steady state Fast steady state

Simple profiling Extensive profiling

A staged optimization system

Interpreter

Binary memory

image

Basic block

cache

Code cache

Profile data

Optimizer

Translator

Emulation

manager


Dynamic program behavior
Dynamic Program Behavior block Highly optimized blocks

  • Dynamic control flow is highly predictable

.

.

R3 ← 100

loop: R1 ← mem(R2)

Br found if R1 == –1

R2 ← R2 + 4

R3 ← R3 – 1

Br loop if R3 != 0

.

.

found: .

.

.


Dynamic program behavior1

50% block Highly optimized blocks

40%

30%

20%

10%

0%

0-10%

10-20%

20-30%

30-40%

40-50%

50-60%

60-70%

70-80%

80-90%

>90%

Dynamic Program Behavior

  • Distribution of taken conditional branches

Fraction of static conditional branches

Percent taken

Predominantly not taken : 28%

Predominantly taken : 42%

Back...


Dynamic program behavior2

100% block Highly optimized blocks

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

176.gcc

181.mcf

197.parser

252.eon

256.bzip2

171.swim

173.applu

177.mesa

187.facerec

189.lucas

Dynamic Program Behavior

  • Consistency of conditional branches

    • The high percentage consists of backward branches

Dynamic branches decided same as previous time

Benchmark

SPEC


Dynamic program behavior3

25% block Highly optimized blocks

20%

15%

10%

5%

0%

1

2

3

4

5

6

7

8

9

>9

Percent of indirect jumps

Number of different destinations

Dynamic Program Behavior

  • The predictability of indirect jumps

    • Some jump destination addresses seldom change


Dynamic program behavior4

0.7 block Highly optimized blocks

0.6

0.5

0.4

Fraction with constant value

0.3

0.2

0.1

0

All

Add/Sub

Load

Logic

Shift

Set

Instruction type

Dynamic Program Behavior

  • The predictability of data value

Static instructions always compute the same value

Static

Dynamic instructions execute the static instructions

Dynamic


Profiling
Profiling block Highly optimized blocks

  • The process of collecting instruction and data statistics for an executing program

  • Optimization based on profiling work

Interpreter

Binary memory

image

Basic block

cache

Code cache

Profile data

Optimizer

Translator

Emulation

manager

Back...


The role of profiling

A block Highly optimized blocks

B

C

D

E

F

The Role of Profiling

  • Traditional profiling

HLL

Program

Compiler

Frontend

Compiler

Backend

Instrumented

Code

Instrumented

Code

Program

Execution

Program

Statistics

Optimizing

Compiler

Optimized

Binary

Test Data


The role of profiling1

A block Highly optimized blocks

B

D

E

The Role of Profiling

  • On-the-fly profiling in a dynamic optimizing VM

Partial

Program

Statistics

Translator/

Optimizer

Program

Binary

Interpreter

Program

Data


Types of profiles
Types of Profiles block Highly optimized blocks

  • Several types of profile data

    • How frequently different code regions are being executed?

      • It can be used to decide the level of optimization

    • Is control flow predictability?

      • It may be used as the basis for gathering and rearranging basic blocks

      • Rearranged basic blocks get a chance to be merged superblock


Types of profiles1

A block Highly optimized blocks

A

65

50

15

B

C

B

C

50

15

50

12

13

17

48

D

D

38

25

10

2

E

E

15

48

F

F

17

Types of Profiles

A basic block profile

A edge profile


Collecting profiles
Collecting Profiles block Highly optimized blocks

  • Instrumentation-based profiling

    • Specific program-related events and counts all instances of the events being profiled

    • Software-based Vs Hardware-based

      • Speed? Support? Flexibility?

  • Sampling-based profiling

    • Program runs in its unmodified form, the program is interrupted and event is captured

  • Instrumentation Vs Sampling

    • Overhead : Instrumentation < Sampling

      • Sampling causes traps!


Profiling during interpretation

Branch PC block Highly optimized blocks

HASH

Takencount

Not-takencount

PC

Profiling During Interpretation

Instruction function list..branch_conditional(inst) { BO = extract(inst, 25, 5);

BI = extract(inst, 20, 5);

displacement = extract(inst, 15, 14) * 4;

.

.

// code to compute whether branch should be taken

.

.

profile_addr = lookup(PC);

if (branch_taken)

profile_cnt(profile_addr, taken);

PC = PC + displacement;

Else

profile_cnt(profile_addr, nottaken);

PC = PC + 4;

}

Profile Table for Collecting an Edge Profile During Interpretation

PowerPC Branch Conditional Interpreter Routine


Profiling translated code
Profiling Translated Code block Highly optimized blocks

increment edge counter (i)if (counter (i) > trigger) then invoke optimizerelse branch to fall-through basic block

increment edge counter (j)if (counter (j) > trigger) then invoke optimizerelse branch to target basic block

Edge Profiling Code Inserted into Stubs of a Binary Translated Basic Block

Emulation Stages


Profiling overhead
Profiling Overhead block Highly optimized blocks

  • For profiling during interpretation, occurring 10-20% overhead

  • Profiling overheads can be reduced

    • To reduce the number of instrumentation points by selecting a smaller set of key points


Optimizing translation blocks
Optimizing Translation Blocks block Highly optimized blocks

  • Two-part strategy for optimzing

    • Using dominant control flow for enhancing memory locality

    • Making a translation blocks larger

      • Traces, Superblocks, Tree groups

  • Two parts of the strategy are actually relatively independent


Improving locality
Improving Locality block Highly optimized blocks

  • Two kinds of memory localities

    • Spatial locality

      • Access to a memory location is soon followed by a memory access to an adjacent memory location

    • Temporal locality

      • Access to a memory location is accessed again in the near future


Improving locality1

3 block Highly optimized blocks

A

30

70

D

B

1

29

68

2

E

F

C

29

68

1

G

97

1

Improving Locality

  • Example code sequence

A

Br cond1 == true

B

Br cond2 == false

C

Br uncond

D

Br cond3 == true

E

Br uncond

F

G

Br cond4 == true


Improving locality2

3 block Highly optimized blocks

A

30

70

D

B

1

29

68

2

B

E

F

C

29

68

1

G

97

1

Improving Locality

  • Rearrange the blocks in memory

A

Br cond1 == false

D

Br cond3 == true

E

G

Br cond4 == true

Br uncond

Br cond2 == false

C

Br uncond

F

Br uncond


Improving locality3
Improving Locality block Highly optimized blocks

A

  • Procedure Inlining

  • Positive & NegativeEffect?

A

X

X

Y

A

Y

Z

Call proc xyz

Proc xyz

B

B

X

B

...

...

...

Y

K

K

Z

K

X

X

Return

Call proc xyz

L

Z

Y

L

Z

L


Traces

3 block Highly optimized blocks

A

Trace 1

Trace 2

30

70

Traces

D

B

Superblocks

Trace 3

1

29

68

2

E

F

C

29

68

1

Relations between Superblocks and Traces

G

97

1

Traces

  • Trace

    • A contiguous sequence

    • Both side entrances and side exits


Superblocks

3 block Highly optimized blocks

A

A

30

70

D

D

B

B

1

29

68

2

E

E

F

C

F

C

29

68

1

G

G

G

G

97

1

Superblocks

  • Superblocks

    • Regions of code with only one entry and one or more exit points


Superblocks1

B block Highly optimized blocks

B

Superblocks

A

A

Br cond1 == false

Br cond1 == false

D

D

Br cond3 == true

Br cond3 == true

E

E

G

G

Br cond4 == true

Br cond4 == true

Br uncond

Br uncond

Br cond2 == false

Br cond2 == false

C

C

G

Br uncond

Br cond4 == true

Br uncond

F

F

G

Br cond4 == true

Br uncond

Br uncond


Tree groups

A block Highly optimized blocks

D

B

E

F

C

G

G

G

Tree Groups

  • Tree groups

    • Regions of code with only one entry and one or more exit points

Figure 4.7


Thank You ! block Highly optimized blocks


Spec benchmarks
SPEC benchmarks block Highly optimized blocks

  • Integer SPEC benchmark

    • 176.gcc – GNU Compiler

    • 181.mcf – Combinatorial Optimization

    • 197.parset – Word Processor

    • 252.eon – Computer Visualization

    • 256.bzip2 – Compression

  • Floating-Point SPEC benchmark

    • 171.swim – Shallow Water Modeling

    • 173.applu – Parabolic

    • 187.facerec – Imageprocessing

    • 189.lucas – Number Theory

Back...


ad