1 / 38

Dynamic Binary Optimization

Dynamic Binary Optimization. Kim Sung-moo. Contents. Optimization Framework Code Reordering Code Optimizations Same-ISA Optimization Systems. Optimization Framework. Optimized target code. Intermediate form. Original Source code. Opt. A B C. A B C. A. A. B. B. C. Comp. C.

morna
Download Presentation

Dynamic Binary Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Binary Optimization Kim Sung-moo

  2. Contents • Optimization Framework • Code Reordering • Code Optimizations • Same-ISA Optimization Systems

  3. Optimization Framework Optimizedtarget code Intermediateform OriginalSource code Opt. A B C A B C A A B B C Comp C Comp Collect basic blocks using profile information Convert to intermediate form; place in buffer Schedule and optimize Generate target code Add compensation code; place in code cache

  4. Optimization of Trace & Superblock Trace Superblock Detailed in Section 4.5.1 Optimize Optimize Compensationcode Compensationcode Compensationcode Compensationcode Compensationcode Compensationcode

  5. Caution during Optimization • The program’s execution after optimization can’t be the same as before. (detailed in Section 4.6.2) Source r4 ← r6 + 1r1← r2 + r3r1 ← r4 + r5r6 ← r1 * r7 Target r4 ← r6 + 1r1 ← r4 + r5r6 ← r1 * r7 Remove dead assignment(no trap) trap optimize Target withsaved reg.state r1 ← r2 + r3s1 ← r1 * r7r9 ← r1 + r5r6 ← s1r3 ← r6 + 1 Source r1 ← r2 + r3r9 ← r1 + r5r6 ← r1 * r7r3 ←r6+ 1 Target r1 ← r2 + r3r6 ← r1 * r7r9 ← r1 + r5r3 ← r6 + 1 trap reschedule solution trap

  6. Consistent Register Mapping Superblock A Superblock B • When one superblock branches to another or jumps to interpreter • The source-to-target register mapping must be correctly managed. • Solution : Section 4.6.3 R1 ↔ r1 R2 ↔ r1 R3 ↔ r1 interpreter Superblock C

  7. Code Reordering • Basis of all optimizations • Easy to understand • Deep relation to pipelining

  8. Primitive Instruction Reordering • reg : instruction that produce a register result • load, ALU, shift • easily undone • mem : instruction that place a value in memory • store • not easily undone • br : branch instruction • join : the point where branch target enters the code sequence

  9. Move instr. around branch reg mem br br br br reg reg(compensation) mem mem(compensation) R1 ← mem(R6) R2 ← mem(R6+4) R3 ← R1 + 1 R4 ← R1 << 2 br exit if R7 == 0 R7 ← R7 + 1 mem(R6) ← R3 R1 ← mem(R6) R2 ← mem(R6+4) R3 ← R1 + 1 br exit if R7 == 0 R4 ← R1 << 2 R7 ← R7 + 1 mem(R6) ← R3 R4 ← R1 << 2

  10. Move instr. from below to above branch reg(T) Previous memory value is unrecoverable. br br br R ← T reg(R) mem R2 ← R1 << 2 br exit if R8 == 0 R6 ← R7 * R2 mem(R6) ← R3 R6 ← R2 + 2 R2 ← R1 << 2 T1 ← R7 * R2 br exit if R8 == 0 R6 ← T1 mem(T1) ← R3 R6 ← R2 + 2 R2 ← R1 << 2 T1 ← R7 * R2 br exit if R8 == 0 mem(T1) ← R3 R6 ← R2 + 2

  11. Move instr. above join point reg(compensation) mem (compensation) reg mem join point join point join point join point reg mem R1 ← R1 + 1 R7 ← mem(R6) R7 ← R7 + 1 R1 ← R1 + 1 R7 ← mem(R6) R7 ← R7 + 1 R7 ← mem(R6)

  12. Move instr. in straight-line reg(R) reg(R) reg reg mem mem reg(R) R ← T reg(R) R ← T R1 ← R1 * 3 mem(R6) ← R1 R7 ← R7 << 3 R9 ← R7 + R2 R1 ← R1 * 3 T1 ← R7 << 3 mem(R6) ← R1 R7 ← T1 R9 ← R7 + R2

  13. Implementing a Scheduling Algorithm • Translate to Single-Assignment Form • Form Register Map • Reorder Code • Determine Checkpoints • Assign Register • Add Compensation Code

  14. 1. Translate to Single-Assignment Form • Single-Assignment Form → A register is assigned a new value only once. Original Source Code add %eax, %ebx bz L1 mov %ebx, 4(%eax) mul %ebx, 10 add %ebx, 1 add %ecx, 1 bz L2 add %ebx, %eax br L3 Single-Assignment Form t5 ← r1 + r2, set CR0 bz CR0, L1 t6 ← mem(t5 + 4) t7 ← t6 * 10 t8 ← t7 + 1 t9 ← r3 + 1, set CR0 bz CR0, L2 t10 ← t8 + t5 b L3

  15. 2. Form Register Map • Register map (RMAP) → enable to track the values as assigned in the original source code. Single-Assignment Form t5 ← r1 + r2, set CR0 bz CR0, L1 t6 ← mem(t5 + 4) t7 ← t6 * 10 t8 ← t7 + 1 t9 ← r3 + 1, set CR0 bz CR0, L2 t10 ← t8 + t5 b L3 Register Map(RMAP)eax ebx ecx edx t5 r2 r3 r4 t5 r2 r3 r4 t5 t6 r3 r4 t5 t7 r3 r4 t5 t8 r3 r4 t5 t8 t9 r4 t5 t8 t9 r4 t5 t10 t9 r4 t5 t10 t9 r4

  16. 3. Reorder Code • To run code efficiently. • ex) reduce stalling. (in pipeline) Before Scheduling a: t5 ← r1 + r2, set CR0 b: bz CR0, L1 c: t6 ← mem(t5 + 4) d: t7 ← t6 * 10 e: t8 ← t7 + 1 f: t9 ← r3 + 1, set CR0 g: bz CR0, L2 h: t10 ← t8 + t5 i: b L3 After Scheduling a: t5 ← r1 + r2, set CR0 c: t6 ← mem(t5 + 4) b: bz CR0, L1 d: t7 ← t6 * 10 f: t9 ← r3 + 1, set CR0 g: bz CR0, L2 e: t8 ← t7 + 1 h: t10 ← t8 + t5 i: b L3 Register Map(RMAP)eax ebx ecx edx t5 r2 r3 r4 t5 t6 r3 r4 t5 r2 r3 r4 t5 t7 r3 r4 t5 t8 t9 r4 t5 t8 t9 r4 t5 t8 r3 r4 t5 t10 t9 r4 t5 t10 t9 r4 L2: t8 ← t7 + 1

  17. 4. Determine Checkpoints • Commit : all preceding instr. in original code are completed. • Checkpoint : committed closest instr. • If it traps, checkpoint is backup point. → precise state recovery. After Scheduling a: t5 ← r1 + r2, set CR0 c: t6 ← mem(t5 + 4) b: bz CR0, L1 d: t7 ← t6 * 10 f: t9 ← r3 + 1, set CR0 g: bz CR0, L2 e: t8 ← t7 + 1 h: t10 ← t8 + t5 i: b L3 Register Map(RMAP)eax ebx ecx edx t5 r2 r3 r4 t5 t6 r3 r4 t5 r2 r3 r4 t5 t7 r3 r4 t5 t8 t9 r4 t5 t8 t9 r4 t5 t8 r3 r4 t5 t10 t9 r4 t5 t10 t9 r4 Commit Checkpoint a @ a b,c a d c d d e,f,g d h g i h

  18. 5. Assign Register • “X” : where live range have been extended. • branch or trap Register Live Ranges After Assignment a: r1 ← r1 + r2, set CR0 c: r5 ← mem(r1 + 4) b: bz CR0, L1 d: r2 ← r5 * 10 f: r5 ← r3 + 1, set CR0 g: bz CR0, L2 e: r2 ← r2 + 1 h: r2 ← r2 + r1 i: b L3 Register Map(RMAP)eax ebx ecx edx r1 r2 r3 r4 r1 r5 r3 r4 r1 r2 r3 r4 r1 r2 r3 r4 r1 r2 r5 r4 r1 r2 r5 r4 r1 r2 r3 r4 r1 r2 r5 r4 r1 r2 r5 r4

  19. 6. Add Compensation Code • Consider compensation code and consistent register mapping. After Assignment a: r1 ← r1 + r2, set CR0 c: r5 ← mem(r1 + 4) b: bz CR0, L1 d: r2 ← r5 * 10 f: r5 ← r3 + 1, set CR0 g: bz CR0, L2 e: r2 ← r2 + 1 h: r2 ← r2 + r1 i: b L3 Compensation Code Added a: r1 ← r1 + r2, set CR0 c: r5 ← mem(r1 + 4) b: bz CR0, L1 d: r2 ← r5 * 10 f: r5 ← r3 + 1, set CR0 g: bz CR0, L2’ e: r2 ← r2 + 1 h: r2 ← r2 + r1 i: b L3 r3 ← r5 PowerPC Code a: add. r1, r1, r2 c: lwz r5, 4(r1) b: beq CR0, L1 d: muli r2, r5, 10 f: addic. r5, r3, 1 g: beq CR0, L2’ e: addi r2, r2, 1 h: add r2, r2, r1 i: b L3 mr r3, r5 L2’: r3 ← r5 r2 ← r2 + 1 L2’: mr r3, r5 addi r2, r2, 1

  20. 5a. Assign Register with Condition Codes • “y” : where live range must be extended. • Condition code must be materialized. Register Live Ranges After Assignment a: r6 ← r1 + r2, set CR0 c: r5 ← mem(r6 + 4) b: bz CR0, L1 d: r2 ← r5 * 10 f: r5 ← r3 + 1, set CR0 g: bz CR0, L2 e: r2 ← r2 + 1 h: r2 ← r2 + r6 i: b L3 r1 ← r6 Register Map(RMAP)eax ebx ecx edx r6 r2 r3 r4 r6 r5 r3 r4 r6 r2 r3 r4 r6 r2 r3 r4 r6 r2 r5 r4 r6 r2 r5 r4 r6 r2 r3 r4 r6 r2 r5 r4 r6 r2 r5 r4 r1 r2 r5 r4

  21. Superblocks vs Traces

  22. Basic Optimizations • Constant Propagation & Constant Folding • Strength Reduction • Code Sinking • Dead-Assignment Elimination • Copy Propagation • Common-Subexpression Elimination (CSE) • Hoisting a loop invariant expression out of a loop (Loop Invariant Code Motion)

  23. Basic Optimizations (1) R1 ← 6 R5 ← R1 + 2 R6 ← R7 * R5 R1 ← 6 R5 ← 6 + 2 R6 ← R7 * R5 Constant Propagation Constant Folding R1 ← 6 R5 ← 8 R6 ← R7 * R5 R1 ← 6 R5 ← 8 R6 ← R7 << 3 R1 ← 6 R5 ← 8 R6 ← R7 * 8 Strength Reduction Constant Propagation

  24. Basic Optimizations (2) R1← 28 R1← 6 R5 ← R1 + 2 R6 ← R7 * R5 Join point inhibits code optimization.

  25. Basic Optimizations (3) R1 ← 1 R3 ← R3 + R2 Br L1 if R7 != 0 R3 ← R7 + 1 partially dead (code sinking) L1: R3 ← R3 + 1 R1 ← 1 Br L1 if R7 != 0 R3 ← R3 + R2 R3 ← R7 + 1 fully dead (can be removed) L1: R3 ← R3 + R2 R3 ← R3 + 1 R1 ← 1 Br L1 if R7 != 0 R3 ← R7 + 1 L1: R3 ← R3 + R2 R3 ← R3 + 1

  26. Basic Optimizations (4) R1 ← R2 + R3 R4 ← R1 R5 ← R5 * R4 ∙ ∙ ∙ R4 ← R7 + R8 R1 ← R2 + R3 R4 ← R1 R5 ← R5 * R1 ∙ ∙ ∙ R4 ← R7 + R8 R1 ← R2 + R3 R5 ← R5 * R1 ∙ ∙ ∙ R4 ← R7 + R8 Copy Propagation Dead-Assignment Elimination (We can do this because there is not R4 at RHS.)

  27. Basic Optimizations (5) R1 ← R2 + R3 R5 ← R2 R6 ← R5 + R3 R1 ←R2 + R3 R5 ← R2 R6 ← R2 + R3 Copy Propagation R1← R2 + R3 R5 ← R2 R6 ← R1 Common-Subexpression Elimination (CSE)

  28. Basic Optimizations (6) L1 : R1 ← R2 + R3 mem(R4) ← R1 R4 ← R4 + 4 ∙ ∙ ∙ br L1 if R7 != 0 R1 ← R2 + R3 L1 : mem(R4) ← R1 R4 ← R4 + 4 ∙ ∙ ∙ br L1 if R7 != 0 Hoisting a loop invariant expression out of a loop(Loop Invariant Code Motion)

  29. Compatibility Issues • It is very important whether an optimization is safe or unsafe. • Usually safe optimizations • don’t remove trapping instructions • ex) copy-propagation, constant-propagation, constant-folding • More care required optimizations • remove trapping instructions • ex) dead-assignment elimination, loop invariant code motion, strength reduction

  30. Inter-superblock Optimizations • We want additional optimizations between superblocks. • Solutions • Use tree group : Section 4.3.5 • Remove some of register copies at exit points • Use epilog & prolog side table Superblock 1 r2 ← r7 ∙ br L1 if r4==0 ∙ r2 ← r1 + 2 Need to be eliminated.(there is not r2 at RHS after here.) Superblock 2 L1: r2 ← r3 + 2 ∙ ∙

  31. Epilog & Prolog Side Table Epilog side table r1 r2 r3 …….. rn 0 1 1 …….. 0 Superblock 1 0 1 1 ….. 0AND0 1 0 ….. 1-----------------------0 1 0 ….. 0 Prolog side table r1 r2 r3 …….. rn 0 1 0 …….. 1 Superblock 2 Register r2 is dead along path from superblock 1 to 2. Instruction r2←r7 in superblock 1 can be removed.

  32. Epilog & Prolog Side Table • Epilog side table • when a superblock is exited, • keeps a mask indicating the dead registers. • Prolog side table • when a superblock is entered, • keeps a mask indicating the written registers before being read. (first encounter at LHS) • When 2 superblocks are linked, AND the bit masks. • Any bits that remain set → dead register.

  33. Instruction-Set-Specific Optimizations • Why do? : Because each instruction set has its own features. • Example 1. When ISA using alignment accesses unaligned data. • Invoke trap → trap handler → use multiple instructions : extremely slow • Use inlined multi-instruction sequence.

  34. If-conversion (1) • Example 2. if-conversion. • Instruction set can be enhanced by adding new instruction. • If conditional move instruction (cmovgt) is added, hammock region can be removed. if (r4 > 0) If (r4 > 0) then r5 = r5 + 1 else r5 = r5 - 1; r6 = r6 + r5 hammock then r5 = r5 + 1 else r5 = r5 - 1 region r6 = r6 + r5

  35. If-conversion (2) • Assembly code with branch cmpi cr0, r4, 0 ;compare r4 with zero bgt cr0, skip ;branch to skip if r4>0 addi r5, r5, 1 ;add 1 to r5 b next ;branch to next skip : subi r5, r5, 1 ;sub 1 from r5 next : add r6, r6, r5 ;accumulate r5 values in r6 • Assembly code after if-conversion cmpi cr0, r4, 0 ;compare r4 with zero addi r30, r5, 1 ;add 1 to r5 subi r5, r5, 1 ;sub 1 from r5 cmovgt r5, r30, cr0 ;conditional move r30 to r5 if r4>0 add r6, r6, r5 ;accumulate r5 values in r6 • hammock region is removed.

  36. Same-ISA Optimization Systems • Easy to perform fast initial emulation of source binary • Dynamic optimization is not a necessity. • Sample-based profiling is more attractive. • No instruction semantic mismatch problems.

  37. Optimization Using Basic Block Cache Source Binary Basic Block Cache Superblock Cache A W A stub Map Table B B link C stub X D E stub E Y indirect jump stub stub

  38. Code Patching Source Binary Superblock Cache W A B patch link C X D E patch Y patch indirect jump

More Related