Dynamic binary optimization part 1
1 / 51

Dynamic Binary Optimization – Part 1 - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Dynamic Binary Optimization – Part 1. 2006. 9.25 Nam, E Hyun. Contents. Overview Dynamic program Behavior Profiling Optimizing Translation blocks. Overview : Optimization. Optimization Migration of VM consideration from compatibility to performance Goal

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Dynamic Binary Optimization – Part 1

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Dynamic Binary Optimization – Part 1

2006. 9.25

Nam, E Hyun


  • Overview

  • Dynamic program Behavior

  • Profiling

  • Optimizing Translation blocks

Overview : Optimization

  • Optimization

    • Migration of VM consideration from compatibility to performance

  • Goal

    • To close the gap between a guest’ emulated performance and native platform performance

  • Type

    • Translation block chaining

    • Enlarging the translation block

    • Reordering translated instructions

    • Conventional complier optimization techniques

Overview : Profile

  • Profile

    • Statistics regarding a program’s behavior

    • A guide for making optimization decision

      • Common optimization strategy is to use profiling to determine the path that are predominantly followed by control flow

  • Type of profile information

    • Instructions( or Basic Blocks ), more heavily executed

    • Sequence in which BB are most commonly executed

    • Behavior of particular data variables and addresses

Overview : Profile

  • Advantage of profile information

    • Providing information that may not have been available when a program was originally compiled

Overview : BB rearrangement

  • Definition

    • Method, so that predominant path has instructions in consecutive memory location

  • Advantages

    • Nice localization

    • Efficient instruction fetching

  • Type

    • Trace

    • Superblock

    • Tree group

Overview : Staged emulation

  • Relation between emulation and optimization

    • Tightly integrated with emulation

    • Optimization is part of an emulation framework that support staged emulation

  • Staged emulation

    • Based on tradeoff between start-up time and steady state performance

    • Interpretation  Binary translation  Dynamic binary optimization

Overview : Staged emulation

  • Stages of staged emulation

    • Interpretation

    • BB translation( e.g. chaining )

    • Optimized translation( e.g. superblock )

    • Highly optimized translation

Overview : Spectrum of emulation

Overview : Staged emulation strategy

  • Strategy decision factors

    • Source and target ISA

    • Type of VM being implemented

    • Design objective

    • Tradeoff between Obtained optimization performance and optimization, profiling overhead

  • Example

    • Original HP Dynamo system, Digital FX!32

      • Interpret  optimized, translated code

    • DynamoRIO

      • Simple binary translation  optimization

    • Shade

      • Interpretation  simple binary translation


  • Overview

  • Dynamic program Behavior

  • Profiling

  • Optimizing Translation blocks

Dynamic program behavior

  • Goal

    • Optimization depends on program’s structure and dynamic behavior

    • By profiling, optimization system can learn about program’s structure and dynamic behavior

  • Important characteristics of program

    • High predictability of dynamic control flow

    • Correlation of branch direction, between current and most recent previous execution

Dynamic program behavior

  • Important characteristics of program

    • Backward instruction

      • Is typically taken

    • Predictability of indirect jump

      • Switch statement

      • Return from procedure call

    • Predictability of data value


  • Overview

  • Dynamic program Behavior

  • Profiling

    • Overview

    • Role

    • Type

    • Collecting the profile data

    • Profile during interpretation

    • Profiling translated code

    • Overhead

  • Optimizing Translation blocks

Profiling : Role

  • Definition

    • The process of collecting instruction and data statistics for an executing program

  • Usage

    • Input to code-optimization process

  • Principle of profiling

    • Predictability of program

    • Past behavior will often hold for future behavior

Profiling : Role

  • Traditional profiling & optimization procedure

    • Decomposing the source program

      into control flow graph

    • Analyzing the graph and inserting

      probes to collect profile information

    • Program running with a typical

      data input

    • Generating profile data

    • Static profile log analysis

    • Generating optimized code

  • Property

    • Fully analyzed

    • Optimal placement of probe

    • Entire program run and complete profile

Profiling : Role

  • Difficulty, requirement and limitation in dynamic optimization

    • Program structure is not known when a program begins

      • Program structure must be discovered in an incremental way

    • Inserting profiling probes in a globally optimal manner

    • Optimization decision must be made as early as possible

      • Statistics from a partial execution of the program

Profiling : Role

  • Tradeoff between overhead and benefit

    • Overhead : Initial analysis + actual collection of profile data

    • Benefit : execution time reduction due to optimization

  • Static optimization

    • Overhead are paid once

  • Dynamic optimization

    • Overhead are paid every time a guest program runs

    • Benefits must outweigh the Overhead

Profiling : Type of profile data

  • Frequency of Execution of different code region

    • Hotspot

    • Interpretation VS binary translation

  • Profile data which is based on Control flow( branch and Jump ) predictability

    • Can be used for determining aspects of a program’s dynamic execution behavior

    • Used as basis for gathering and rearranging BBs into larger unit

  • Used to guide specific optimization

    • Address

    • Data

Profiling : Type of profile data

  • Basics

    • Nodes : BBs

    • Edges : flow of control

  • BB profile

    • Numbers are counts of the corresponding BB’s execution

  • Edge profile

    • BB profile can be derived from edge profile

  • Path profile

    • Approximate the path profile by using a heuristics based on edge profile

Profile : collecting the profile

  • Instrumentation based profiling

    • Target program related events

    • Count all instances of the event being profiled

    • Many different events can be monitored simultaneously

      • Monitoring method : HW, SW

  • Sampling based profiling

    • Program runs in its unmodified form

    • Program is interrupted and an instances of program related event is captured

  • Tradeoff

    • Instrumentation based

      • slow but can collect given number of profile data over much shorter period of time

    • Sampling based

      • fast but requires a longer time for collecting the same amount of profile information

Profile : collecting the profile

  • Strategy

    • Collection technique depends on emulation spectrum

      • Interpretation

        • SW instrumentation is about the only choice

      • Optimizing binary translation, dynamic optimization system

        • Instrumentation

      • Already well optimized longer running program

        • Sampling

Profile : profiling during interpretation

  • Key points

    • Source instructions are actually access as data

      • Profiling code must be added to the interpret routine

      • Profiling is applied to specific instruction type rather than specific instruction

    • It can be applied for Certain classes of instructions rather than specific instruction

      • E.g. Backward branch

  • Method

    • BB profile

      • profile code should be added to all control transfer instructions after the PC bas been updated

    • Edge profile

      • Both the PC of the control transfer instruction and the targetPC are used to define a specific instruction

Profile : profiling during interpretation

  • Profile Table

    • Access method

      • BB profile : Via PC value of control transfer destination

      • Edge profile : PC value that define an edge

      • Hash function

    • Contents of entry

      • Basic block or edge count

      • For conditional branch, taken count and not taken count

Profile : profiling during interpretation

Profile : profiling during interpretation

  • Profile Count decaying

    • Problem of profile table

      • A count field overflow

    • Solution

      • Key point

        • Optimization method focus on not absolute count but relative frequency

        • Recent program event history is more valuable than that of past

      • Decay process

        • Periodically divide all the profile count by 2

Profile : profiling during interpretation

  • Profiling Jump Instruction

    • Difficulties of Jump compared with conditional branch

      • Switch statement : frequently change

      • Return from procedure call : many target address

    • Solution

      • Key point

        • Profile-driven optimization of indirect jump tend to be focused on those jumps that very frequently have the same target

      • Maintain profile table with a small number of target address and track only the more recently used target

Profile : profiling translated code

  • Instrumenting individual instructions

    • Each individual instruction can have its own custom profiling code

      • = Profiling can be selectively applied

      • = Profile counters can be assigned to each static instructions

    • Profile counters can be directly addressed without hashing

    • Profile code can be easily inserted and removed as needed

Profiling : Overhead

  • Performance overhead

    • Example

      • To access hash table : hash function + 1 load + 1 compare

      • To increment proper count : 1 load + 1store + 1add

    • Profiling during interpretation VS profiling translated code

      • Absolute overhead VS relative overhead

  • Memory overhead

    • Profile table

  • Overhead reduction method

    • Reducing the number of instrumentation point

      • Heuristic + Using collected data

    • Code duplication

      • Attractive for same-ISA optimization ( 4.7 )


  • Overview

  • Dynamic program Behavior

  • Profiling

  • Optimizing Translation blocks

    • Overview

    • Improving locality

    • Traces

    • Superblocks

    • Dynamic superblocks formation

    • Tree group

Optimizing translation blocks : Overview

  • Two strategy

    • Improving locality

    • Optimization on enlarged translation blocks

Optimizing translation blocks : Improving locality

  • Locality

    • Temporal

    • Spatial

  • Problem

    • Cache space

    • Performance

      • Low instruction fetch


Optimizing translation blocks : Improving locality

  • Rearrange the layout of the blocks in memory

    • Conditional branch tests are reversed

    • Unconditional branch removal/Add

    • Instruction fetch efficiency is improved

Optimizing translation blocks : Improving locality

  • Procedure inlining

Optimizing translation blocks : Improving locality

  • Partial procedure inlining

    • In dynamic optimization system

Optimizing translation blocks : Improving locality

  • Pros and Cons of procedure inlining

    • Pros

      • Increase spatial locality

      • Remove overhead

        • Call and return instructions are removed

        • Save/restore instruction are removed

    • Cons

      • Increase code size

      • Increase register “pressure”

        • Inlined code needs more register than procedure call

  • Consequently, procedure inlining is typically used only for those procedures that are very frequently called and are very small

Optimizing translation blocks

  • Three ways of rearranging basic blocks according to control flow

    • Trace formation

    • Superblock formation

      • Most widely used in VM implementation

    • Tree group

      • Useful when control flow is difficult to predict

      • Provide wider scope for optimization

Optimizing translation blocks : Traces

  • Traces

    • Chunks of contiguous instructions containing multiple BBs

    • Traces > Superblock

  • Static traces forming step

    • 1. Profile collection using test data

    • 2. Begin with start point

      • Most frequently executed BB ,not already part of a trace

    • 3. Collection BB through most common control path, until a stopping condition is met

      • A block already belonging to another trace is reached

      • The arrival at a procedure call/return boundary

    • 4. Collect the BBs into a trace

      • Reverse branch tests

      • removing/adding unconditional branch

    • 5. stop otherwise go to step 2

  • In dynamic environment, Traces are not commly used s translation blocks

Optimizing translation blocks : Traces

Optimizing translation blocks : Superblocks

  • Superblocks VS Traces

    • Side entrance

  • Problems in forming superblocks

    • Small and a number of superblocks

    • Too small to provide many opportunities for optimizations

  • Tail duplication

    • The process of replicating code that appears at the end of a superblock in order to form other superblock

Optimizing translation blocks : Superblocks

Optimizing translation blocks : Dynamic superblock formation : Overview

  • Dynamic

    • Formed incrementally as the source code is being emulated

  • Complication

    • BB replication leads to more choices

  • Key question

    • Starting point

    • Continuation

    • Stopping point

Optimizing translation blocks : Dynamic superblock formation : starting point

  • Heavily used block

    • By using Profile information

  • Method for determining profile points

    • All basic block

    • Heuristics

      • Targets of backward branches an candidates starting point

      • Exit arc from an existing superblock

  • Start threshold

    • When a profiled BB’s execution frequency reaches this value, a new superblock is started

    • Depends on emulation tradeoff

    • A few tens to hundreds of execution is typical

Optimizing translation blocks : Dynamic superblock formation : Continuation

  • Continuation

    • Which subsequent blocks should be collected and added as the superblock is grown

  • Most frequently used approach

    • Node profile information is used to identify the most likely successor BB

    • Continuation threshold

      • A relatively complete set of profile data must be collected for all BBs

      • Typically half of start point threshold

    • Continuation set

      • At the time superblock formation is to begin, the set of all BBs that have reached the continuation threshold is collected

Optimizing translation blocks : Dynamic superblock formation : Continuation

  • Most frequently used procedure

Optimizing translation blocks : Dynamic superblock formation : Continuation

  • Most Recently used approach

    • Edge profile information

    • Algorithm

      • Assumption

        • The very next sequence of blocks following a start point is also likely to be a common path

      • Simply follows the actual dynamic control flow path one edge at a time

    • Advantage

      • Only candidate start point need to be profiled

      • = No need to use profiling for continuation blocks

      • = Profile overhead is substantially reduced

Optimizing translation blocks : Dynamic superblock formation : stopping point

  • Type of heuristics to determine stop condition

    • The start point of the same superblock is reached

    • A start point of some other superblock is reached

    • A superblock has reached some maximum length

      • A BB can be used in more than one superblock  there may be multiple copies of a given BB  Explosion of code size

    • When using the most frequently used heuristic, there are no more candidate BBs that have reached the candidate threshold

    • An indirect jump is reached, or there is a procedure call

Optimizing translation blocks : Dynamic superblock formation : Example

  • Most frequently used

Optimizing translation blocks : Dynamic superblock formation : Example

  • Most Recently used

    • Profile point is just A because A is target of backward branch

    • Most likely

      • ADEG  BCG  FG

  • However

    • There is about 30% chance

      • ABCG  DEG  FG

    • There are cases where a most recently executed method may not select superblocks quite as well as most frequently executed method

Optimizing translation blocks : Tree group

  • Background

    • Problems when applying Superblock for Branches that tend to almost evenly split their decision

      • Side exit is frequently taken  compensation code overhead

      • Optimization are typically not done along the side exit  losing performance improvement opportunities

  • Traces, Superblock VS Tree group

    • Tree group

      • conditional branch outcomes are more evenly balanced

      • Generalization of superblock

      • Multiple flow of control

    • Superblocks

      • Conditional branches are predominantly decided one way

      • Single flow of control

Optimizing translation blocks : Tree group

  • Login