dynamic binary optimization part 1 n.
Download
Skip this Video
Download Presentation
Dynamic Binary Optimization – Part 1

Loading in 2 Seconds...

play fullscreen
1 / 51

Dynamic Binary Optimization – Part 1 - PowerPoint PPT Presentation


  • 130 Views
  • Uploaded on

Dynamic Binary Optimization – Part 1. 2006. 9.25 Nam, E Hyun. Contents. Overview Dynamic program Behavior Profiling Optimizing Translation blocks. Overview : Optimization. Optimization Migration of VM consideration from compatibility to performance Goal

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dynamic Binary Optimization – Part 1' - naiara


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
contents
Contents
  • Overview
  • Dynamic program Behavior
  • Profiling
  • Optimizing Translation blocks
overview optimization
Overview : Optimization
  • Optimization
    • Migration of VM consideration from compatibility to performance
  • Goal
    • To close the gap between a guest’ emulated performance and native platform performance
  • Type
    • Translation block chaining
    • Enlarging the translation block
    • Reordering translated instructions
    • Conventional complier optimization techniques
overview profile
Overview : Profile
  • Profile
    • Statistics regarding a program’s behavior
    • A guide for making optimization decision
      • Common optimization strategy is to use profiling to determine the path that are predominantly followed by control flow
  • Type of profile information
    • Instructions( or Basic Blocks ), more heavily executed
    • Sequence in which BB are most commonly executed
    • Behavior of particular data variables and addresses
overview profile1
Overview : Profile
  • Advantage of profile information
    • Providing information that may not have been available when a program was originally compiled
overview bb rearrangement
Overview : BB rearrangement
  • Definition
    • Method, so that predominant path has instructions in consecutive memory location
  • Advantages
    • Nice localization
    • Efficient instruction fetching
  • Type
    • Trace
    • Superblock
    • Tree group
overview staged emulation
Overview : Staged emulation
  • Relation between emulation and optimization
    • Tightly integrated with emulation
    • Optimization is part of an emulation framework that support staged emulation
  • Staged emulation
    • Based on tradeoff between start-up time and steady state performance
    • Interpretation  Binary translation  Dynamic binary optimization
overview staged emulation1
Overview : Staged emulation
  • Stages of staged emulation
    • Interpretation
    • BB translation( e.g. chaining )
    • Optimized translation( e.g. superblock )
    • Highly optimized translation
overview staged emulation strategy
Overview : Staged emulation strategy
  • Strategy decision factors
    • Source and target ISA
    • Type of VM being implemented
    • Design objective
    • Tradeoff between Obtained optimization performance and optimization, profiling overhead
  • Example
    • Original HP Dynamo system, Digital FX!32
      • Interpret  optimized, translated code
    • DynamoRIO
      • Simple binary translation  optimization
    • Shade
      • Interpretation  simple binary translation
contents1
Contents
  • Overview
  • Dynamic program Behavior
  • Profiling
  • Optimizing Translation blocks
dynamic program behavior
Dynamic program behavior
  • Goal
    • Optimization depends on program’s structure and dynamic behavior
    • By profiling, optimization system can learn about program’s structure and dynamic behavior
  • Important characteristics of program
    • High predictability of dynamic control flow
    • Correlation of branch direction, between current and most recent previous execution
dynamic program behavior1
Dynamic program behavior
  • Important characteristics of program
    • Backward instruction
      • Is typically taken
    • Predictability of indirect jump
      • Switch statement
      • Return from procedure call
    • Predictability of data value
contents2
Contents
  • Overview
  • Dynamic program Behavior
  • Profiling
    • Overview
    • Role
    • Type
    • Collecting the profile data
    • Profile during interpretation
    • Profiling translated code
    • Overhead
  • Optimizing Translation blocks
profiling role
Profiling : Role
  • Definition
    • The process of collecting instruction and data statistics for an executing program
  • Usage
    • Input to code-optimization process
  • Principle of profiling
    • Predictability of program
    • Past behavior will often hold for future behavior
profiling role1
Profiling : Role
  • Traditional profiling & optimization procedure
    • Decomposing the source program

into control flow graph

    • Analyzing the graph and inserting

probes to collect profile information

    • Program running with a typical

data input

    • Generating profile data
    • Static profile log analysis
    • Generating optimized code
  • Property
    • Fully analyzed
    • Optimal placement of probe
    • Entire program run and complete profile
profiling role2
Profiling : Role
  • Difficulty, requirement and limitation in dynamic optimization
    • Program structure is not known when a program begins
      • Program structure must be discovered in an incremental way
    • Inserting profiling probes in a globally optimal manner
    • Optimization decision must be made as early as possible
      • Statistics from a partial execution of the program
profiling role3
Profiling : Role
  • Tradeoff between overhead and benefit
    • Overhead : Initial analysis + actual collection of profile data
    • Benefit : execution time reduction due to optimization
  • Static optimization
    • Overhead are paid once
  • Dynamic optimization
    • Overhead are paid every time a guest program runs
    • Benefits must outweigh the Overhead
profiling type of profile data
Profiling : Type of profile data
  • Frequency of Execution of different code region
    • Hotspot
    • Interpretation VS binary translation
  • Profile data which is based on Control flow( branch and Jump ) predictability
    • Can be used for determining aspects of a program’s dynamic execution behavior
    • Used as basis for gathering and rearranging BBs into larger unit
  • Used to guide specific optimization
    • Address
    • Data
profiling type of profile data1
Profiling : Type of profile data
  • Basics
    • Nodes : BBs
    • Edges : flow of control
  • BB profile
    • Numbers are counts of the corresponding BB’s execution
  • Edge profile
    • BB profile can be derived from edge profile
  • Path profile
    • Approximate the path profile by using a heuristics based on edge profile
profile collecting the profile
Profile : collecting the profile
  • Instrumentation based profiling
    • Target program related events
    • Count all instances of the event being profiled
    • Many different events can be monitored simultaneously
      • Monitoring method : HW, SW
  • Sampling based profiling
    • Program runs in its unmodified form
    • Program is interrupted and an instances of program related event is captured
  • Tradeoff
    • Instrumentation based
      • slow but can collect given number of profile data over much shorter period of time
    • Sampling based
      • fast but requires a longer time for collecting the same amount of profile information
profile collecting the profile1
Profile : collecting the profile
  • Strategy
    • Collection technique depends on emulation spectrum
      • Interpretation
        • SW instrumentation is about the only choice
      • Optimizing binary translation, dynamic optimization system
        • Instrumentation
      • Already well optimized longer running program
        • Sampling
profile profiling during interpretation
Profile : profiling during interpretation
  • Key points
    • Source instructions are actually access as data
      • Profiling code must be added to the interpret routine
      • Profiling is applied to specific instruction type rather than specific instruction
    • It can be applied for Certain classes of instructions rather than specific instruction
      • E.g. Backward branch
  • Method
    • BB profile
      • profile code should be added to all control transfer instructions after the PC bas been updated
    • Edge profile
      • Both the PC of the control transfer instruction and the targetPC are used to define a specific instruction
profile profiling during interpretation1
Profile : profiling during interpretation
  • Profile Table
    • Access method
      • BB profile : Via PC value of control transfer destination
      • Edge profile : PC value that define an edge
      • Hash function
    • Contents of entry
      • Basic block or edge count
      • For conditional branch, taken count and not taken count
profile profiling during interpretation3
Profile : profiling during interpretation
  • Profile Count decaying
    • Problem of profile table
      • A count field overflow
    • Solution
      • Key point
        • Optimization method focus on not absolute count but relative frequency
        • Recent program event history is more valuable than that of past
      • Decay process
        • Periodically divide all the profile count by 2
profile profiling during interpretation4
Profile : profiling during interpretation
  • Profiling Jump Instruction
    • Difficulties of Jump compared with conditional branch
      • Switch statement : frequently change
      • Return from procedure call : many target address
    • Solution
      • Key point
        • Profile-driven optimization of indirect jump tend to be focused on those jumps that very frequently have the same target
      • Maintain profile table with a small number of target address and track only the more recently used target
profile profiling translated code
Profile : profiling translated code
  • Instrumenting individual instructions
    • Each individual instruction can have its own custom profiling code
      • = Profiling can be selectively applied
      • = Profile counters can be assigned to each static instructions
    • Profile counters can be directly addressed without hashing
    • Profile code can be easily inserted and removed as needed
profiling overhead
Profiling : Overhead
  • Performance overhead
    • Example
      • To access hash table : hash function + 1 load + 1 compare
      • To increment proper count : 1 load + 1store + 1add
    • Profiling during interpretation VS profiling translated code
      • Absolute overhead VS relative overhead
  • Memory overhead
    • Profile table
  • Overhead reduction method
    • Reducing the number of instrumentation point
      • Heuristic + Using collected data
    • Code duplication
      • Attractive for same-ISA optimization ( 4.7 )
contents3
Contents
  • Overview
  • Dynamic program Behavior
  • Profiling
  • Optimizing Translation blocks
    • Overview
    • Improving locality
    • Traces
    • Superblocks
    • Dynamic superblocks formation
    • Tree group
optimizing translation blocks overview
Optimizing translation blocks : Overview
  • Two strategy
    • Improving locality
    • Optimization on enlarged translation blocks
optimizing translation blocks improving locality
Optimizing translation blocks : Improving locality
  • Locality
    • Temporal
    • Spatial
  • Problem
    • Cache space
    • Performance
      • Low instruction fetch

bandwidth

optimizing translation blocks improving locality1
Optimizing translation blocks : Improving locality
  • Rearrange the layout of the blocks in memory
    • Conditional branch tests are reversed
    • Unconditional branch removal/Add
    • Instruction fetch efficiency is improved
optimizing translation blocks improving locality3
Optimizing translation blocks : Improving locality
  • Partial procedure inlining
    • In dynamic optimization system
optimizing translation blocks improving locality4
Optimizing translation blocks : Improving locality
  • Pros and Cons of procedure inlining
    • Pros
      • Increase spatial locality
      • Remove overhead
        • Call and return instructions are removed
        • Save/restore instruction are removed
    • Cons
      • Increase code size
      • Increase register “pressure”
        • Inlined code needs more register than procedure call
  • Con sequently, procedure inlining is typically used only for those procedures that are very frequently called and are very small
optimizing translation blocks
Optimizing translation blocks
  • Three ways of rearranging basic blocks according to control flow
    • Trace formation
    • Superblock formation
      • Most widely used in VM implementation
    • Tree group
      • Useful when control flow is difficult to predict
      • Provide wider scope for optimization
optimizing translation blocks traces
Optimizing translation blocks : Traces
  • Traces
    • Chunks of contiguous instructions containing multiple BBs
    • Traces > Superblock
  • Static traces forming step
    • 1. Profile collection using test data
    • 2. Begin with start point
      • Most frequently executed BB ,not already part of a trace
    • 3. Collection BB through most common control path, until a stopping condition is met
      • A block already belonging to another trace is reached
      • The arrival at a procedure call/return boundary
    • 4. Collect the BBs into a trace
      • Reverse branch tests
      • removing/adding unconditional branch
    • 5. stop otherwise go to step 2
  • In dynamic environment, Traces are not commly used s translation blocks
optimizing translation blocks superblocks
Optimizing translation blocks : Superblocks
  • Superblocks VS Traces
    • Side entrance
  • Problems in forming superblocks
    • Small and a number of superblocks
    • Too small to provide many opportunities for optimizations
  • Tail duplication
    • The process of replicating code that appears at the end of a superblock in order to form other superblock
optimizing translation blocks dynamic superblock formation overview
Optimizing translation blocks : Dynamic superblock formation : Overview
  • Dynamic
    • Formed incrementally as the source code is being emulated
  • Complication
    • BB replication leads to more choices
  • Key question
    • Starting point
    • Continuation
    • Stopping point
optimizing translation blocks dynamic superblock formation starting point
Optimizing translation blocks : Dynamic superblock formation : starting point
  • Heavily used block
    • By using Profile information
  • Method for determining profile points
    • All basic block
    • Heuristics
      • Targets of backward branches an candidates starting point
      • Exit arc from an existing superblock
  • Start threshold
    • When a profiled BB’s execution frequency reaches this value, a new superblock is started
    • Depends on emulation tradeoff
    • A few tens to hundreds of execution is typical
optimizing translation blocks dynamic superblock formation continuation
Optimizing translation blocks : Dynamic superblock formation : Continuation
  • Continuation
    • Which subsequent blocks should be collected and added as the superblock is grown
  • Most frequently used approach
    • Node profile information is used to identify the most likely successor BB
    • Continuation threshold
      • A relatively complete set of profile data must be collected for all BBs
      • Typically half of start point threshold
    • Continuation set
      • At the time superblock formation is to begin, the set of all BBs that have reached the continuation threshold is collected
optimizing translation blocks dynamic superblock formation continuation1
Optimizing translation blocks : Dynamic superblock formation : Continuation
  • Most frequently used procedure
optimizing translation blocks dynamic superblock formation continuation2
Optimizing translation blocks : Dynamic superblock formation : Continuation
  • Most Recently used approach
    • Edge profile information
    • Algorithm
      • Assumption
        • The very next sequence of blocks following a start point is also likely to be a common path
      • Simply follows the actual dynamic control flow path one edge at a time
    • Advantage
      • Only candidate start point need to be profiled
      • = No need to use profiling for continuation blocks
      • = Profile overhead is substantially reduced
optimizing translation blocks dynamic superblock formation stopping point
Optimizing translation blocks : Dynamic superblock formation : stopping point
  • Type of heuristics to determine stop condition
    • The start point of the same superblock is reached
    • A start point of some other superblock is reached
    • A superblock has reached some maximum length
      • A BB can be used in more than one superblock  there may be multiple copies of a given BB  Explosion of code size
    • When using the most frequently used heuristic, there are no more candidate BBs that have reached the candidate threshold
    • An indirect jump is reached, or there is a procedure call
optimizing translation blocks dynamic superblock formation example1
Optimizing translation blocks : Dynamic superblock formation : Example
  • Most Recently used
    • Profile point is just A because A is target of backward branch
    • Most likely
      • ADEG  BCG  FG
  • However
    • There is about 30% chance
      • ABCG  DEG  FG
    • There are cases where a most recently executed method may not select superblocks quite as well as most frequently executed method
optimizing translation blocks tree group
Optimizing translation blocks : Tree group
  • Background
    • Problems when applying Superblock for Branches that tend to almost evenly split their decision
      • Side exit is frequently taken  compensation code overhead
      • Optimization are typically not done along the side exit  losing performance improvement opportunities
  • Traces, Superblock VS Tree group
    • Tree group
      • conditional branch outcomes are more evenly balanced
      • Generalization of superblock
      • Multiple flow of control
    • Superblocks
      • Conditional branches are predominantly decided one way
      • Single flow of control