1 / 31

Dynamic Binary Optimization

Dynamic Binary Optimization. Presenter Kim Jin Chul. Contents. 1. Overview of Applying Optimization on VMs. 2. Dynamic Program Behavior. 3. Profiling. 4. Optimizing Translation Blocks. addi r16, r4, 4 ; add 4 to %eax lwzx r17, r2, r16 ; load operand from memory

nydia
Download Presentation

Dynamic Binary Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Binary Optimization Presenter Kim Jin Chul

  2. Contents 1 Overview of Applying Optimization on VMs 2 Dynamic Program Behavior 3 Profiling 4 Optimizing Translation Blocks

  3. addi r16, r4, 4 ; add 4 to %eax lwzx r17, r2, r16 ; load operand from memory add r7, r17, r7 ; perform add of %edx addi r16, r4, 4 ; add 4 to %eax stwx r7, r2, r16 ; store %edx value into memory Classical Optimizations addl %edx, 4(%eax) movl 4(%eax), %edx Translation from IA-32 to PowerPC code. Adopt a Common Subexpression Elimination addi r16, r4, 4 ; add 4 to %eax lwzx r17, r2, r16 ; load operand from memory add r7, r17, r7 ; perform add of %edx stwx r7, r2, r16 ; store %edx value into memory

  4. Optimization Based on Profiling Basic Block A ... ... R3 ← ... R7 ← ... R1 ← R2 + R3 Br L1 if R3 == 0 Basic Block A ... ... R3 ← ... R7 ← ... Br L1 if R3 == 0 Basic Block A ... ... R3 ← ... R7 ← ... Br L1 if R3 == 0 Compensation code R1 ← R2 + R3 Basic Block B ... R6 ← R1 + R6 ... ... Basic Block B ... R6 ← R1 + R6 ... ... Basic Block B ... R6 ← R1 + R6 ... ... use Basic Block C L1: R1 ← 0 ... ... Basic Block C L1: R1 ← 0 ... ... Basic Block C L1: R1 ← 0 ... ... def

  5. Compensation code R1 ← R2 + R3 Basic Block B L2:... R6 ← R1 + R6 ... ... Optimization Based on Profiling Basic Block A ... ... R3 ← ... R7 ← ... R1 ← R2 + R3 Br L1 if R3 == 0 Superblock ... ... R3 ← ... R7 ← ... Br L2 if R3 != 0 R1 ← 0 ... ... Basic Block B ... R6 ← R1 + R6 ... ... Basic Block C L1: R1 ← 0 ... ...

  6. Stages: Interpret Basic translation Optmized block Highly optimized blocks Fast startup Very slow startup Slow steady state Fast steady state Simple profiling Extensive profiling A staged optimization system Interpreter Binary memory image Basic block cache Code cache Profile data Optimizer Translator Emulation manager

  7. Dynamic Program Behavior • Dynamic control flow is highly predictable . . R3 ← 100 loop: R1 ← mem(R2) Br found if R1 == –1 R2 ← R2 + 4 R3 ← R3 – 1 Br loop if R3 != 0 . . found: . . .

  8. 50% 40% 30% 20% 10% 0% 0-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% >90% Dynamic Program Behavior • Distribution of taken conditional branches Fraction of static conditional branches Percent taken Predominantly not taken : 28% Predominantly taken : 42% Back...

  9. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 176.gcc 181.mcf 197.parser 252.eon 256.bzip2 171.swim 173.applu 177.mesa 187.facerec 189.lucas Dynamic Program Behavior • Consistency of conditional branches • The high percentage consists of backward branches Dynamic branches decided same as previous time Benchmark SPEC

  10. 25% 20% 15% 10% 5% 0% 1 2 3 4 5 6 7 8 9 >9 Percent of indirect jumps Number of different destinations Dynamic Program Behavior • The predictability of indirect jumps • Some jump destination addresses seldom change

  11. 0.7 0.6 0.5 0.4 Fraction with constant value 0.3 0.2 0.1 0 All Add/Sub Load Logic Shift Set Instruction type Dynamic Program Behavior • The predictability of data value Static instructions always compute the same value Static Dynamic instructions execute the static instructions Dynamic

  12. Profiling • The process of collecting instruction and data statistics for an executing program • Optimization based on profiling work Interpreter Binary memory image Basic block cache Code cache Profile data Optimizer Translator Emulation manager Back...

  13. A B C D E F The Role of Profiling • Traditional profiling HLL Program Compiler Frontend Compiler Backend Instrumented Code Instrumented Code Program Execution Program Statistics Optimizing Compiler Optimized Binary Test Data

  14. A B D E The Role of Profiling • On-the-fly profiling in a dynamic optimizing VM Partial Program Statistics Translator/ Optimizer Program Binary Interpreter Program Data

  15. Types of Profiles • Several types of profile data • How frequently different code regions are being executed? • It can be used to decide the level of optimization • Is control flow predictability? • It may be used as the basis for gathering and rearranging basic blocks • Rearranged basic blocks get a chance to be merged superblock

  16. A A 65 50 15 B C B C 50 15 50 12 13 17 48 D D 38 25 10 2 E E 15 48 F F 17 Types of Profiles A basic block profile A edge profile

  17. Collecting Profiles • Instrumentation-based profiling • Specific program-related events and counts all instances of the events being profiled • Software-based Vs Hardware-based • Speed? Support? Flexibility? • Sampling-based profiling • Program runs in its unmodified form, the program is interrupted and event is captured • Instrumentation Vs Sampling • Overhead : Instrumentation < Sampling • Sampling causes traps!

  18. Branch PC HASH Takencount Not-takencount PC Profiling During Interpretation Instruction function list..branch_conditional(inst) { BO = extract(inst, 25, 5); BI = extract(inst, 20, 5); displacement = extract(inst, 15, 14) * 4; . . // code to compute whether branch should be taken . . profile_addr = lookup(PC); if (branch_taken) profile_cnt(profile_addr, taken); PC = PC + displacement; Else profile_cnt(profile_addr, nottaken); PC = PC + 4; } Profile Table for Collecting an Edge Profile During Interpretation PowerPC Branch Conditional Interpreter Routine

  19. Profiling Translated Code increment edge counter (i)if (counter (i) > trigger) then invoke optimizerelse branch to fall-through basic block increment edge counter (j)if (counter (j) > trigger) then invoke optimizerelse branch to target basic block Edge Profiling Code Inserted into Stubs of a Binary Translated Basic Block Emulation Stages

  20. Profiling Overhead • For profiling during interpretation, occurring 10-20% overhead • Profiling overheads can be reduced • To reduce the number of instrumentation points by selecting a smaller set of key points

  21. Optimizing Translation Blocks • Two-part strategy for optimzing • Using dominant control flow for enhancing memory locality • Making a translation blocks larger • Traces, Superblocks, Tree groups • Two parts of the strategy are actually relatively independent

  22. Improving Locality • Two kinds of memory localities • Spatial locality • Access to a memory location is soon followed by a memory access to an adjacent memory location • Temporal locality • Access to a memory location is accessed again in the near future

  23. 3 A 30 70 D B 1 29 68 2 E F C 29 68 1 G 97 1 Improving Locality • Example code sequence A Br cond1 == true B Br cond2 == false C Br uncond D Br cond3 == true E Br uncond F G Br cond4 == true

  24. 3 A 30 70 D B 1 29 68 2 B E F C 29 68 1 G 97 1 Improving Locality • Rearrange the blocks in memory A Br cond1 == false D Br cond3 == true E G Br cond4 == true Br uncond Br cond2 == false C Br uncond F Br uncond

  25. Improving Locality A • Procedure Inlining • Positive & NegativeEffect? A X X Y A Y Z Call proc xyz Proc xyz B B X B ... ... ... Y K K Z K X X Return Call proc xyz L Z Y L Z L

  26. 3 A Trace 1 Trace 2 30 70 Traces D B Superblocks Trace 3 1 29 68 2 E F C 29 68 1 Relations between Superblocks and Traces G 97 1 Traces • Trace • A contiguous sequence • Both side entrances and side exits

  27. 3 A A 30 70 D D B B 1 29 68 2 E E F C F C 29 68 1 G G G G 97 1 Superblocks • Superblocks • Regions of code with only one entry and one or more exit points

  28. B B Superblocks A A Br cond1 == false Br cond1 == false D D Br cond3 == true Br cond3 == true E E G G Br cond4 == true Br cond4 == true Br uncond Br uncond Br cond2 == false Br cond2 == false C C G Br uncond Br cond4 == true Br uncond F F G Br cond4 == true Br uncond Br uncond

  29. A D B E F C G G G Tree Groups • Tree groups • Regions of code with only one entry and one or more exit points Figure 4.7

  30. Thank You !

  31. SPEC benchmarks • Integer SPEC benchmark • 176.gcc – GNU Compiler • 181.mcf – Combinatorial Optimization • 197.parset – Word Processor • 252.eon – Computer Visualization • 256.bzip2 – Compression • Floating-Point SPEC benchmark • 171.swim – Shallow Water Modeling • 173.applu – Parabolic • 187.facerec – Imageprocessing • 189.lucas – Number Theory Back...

More Related