1 / 42

Binary Translation

Binary Translation. Jae Wook Kim. Binary Translation. Process of converting the source binary program into a target binary program Enhance performance. Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study. Simulators.

xia
Download Presentation

Binary Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Binary Translation Jae Wook Kim

  2. Binary Translation • Process of converting the source binary program into a target binary program • Enhance performance Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  3. Simulators • First generation • 100 host instructions per guest instruction simulated • Second generation • Reduce the expansion factor to about 10 • Third generation • Translate groups of instructions as a unit • Expansion factor of about 4 • Mimic Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  4. Binary Translation from IA-32 to Power PC addl %edx,4(%eax) movl 4(%eax),%eax add %eax,4 Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study r1 points to IA-32 register context block r2 points to IA-32 memory image r3 contains IA-32 ISA PC value lwz r4,0(r1) ;load %eax from register block addi r5,r4,4 ;add 4 to %eax lwzx r5,r2,r5 ;load operand from memory lwz r4,12(r1) ;load %eax from register block add r5,r4,r5 ;perform add stw r5,12(r1) ;put result into %edx addi r3,r3,3 ;update PC (3 bytes) lwz r4,0(r1) ;load %eax from register block addi r5,r4,4 ;add 4 to %eax lwz r4,12(r1) ;load %eax from register block stwx r4,r2,r5 ;store %eax value into memory addi r3,r3,3 ;update PC (3 bytes) lwz r4,0(r1) ;load %eax from register block addi r4,r4,4 ;add immediate stw r4,0(r1) ;place result back into %eax addi r3,r3,3 ;update PC (3 bytes)

  5. Threaded Interpretation vs Binary Translation intermediate code binary translated target code interpreter routines Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study source code source code binary translator predecoder • In both cases, the original source code is converted to another form • In predecoding, interpreter routines are needed • In binary translation, converted code is directly executed

  6. State Mapping from Target to Source Registers Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study Source Register Block r1 point to the memory image and register block of the source ISA r2 Source MemoryImage other portions of the source sate may be held in target registers program counter r3 Reg 1 r4 r5 Reg 2 mapped directly n+3 Reg n

  7. Binary Translation with State Mapping addl %edx,4(%eax) movl 4(%eax),%eax add %eax,4 Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study r1 points to IA-32 register context block r2 points to IA-32 memory image r3 contains IA-32 ISA PC value r4 holds IA-32 register %eax r7 holds IA-32 register %edx addi r16,r4,4 ;add 4 to %eax lwzx r17,r2,r16 ;load operand from memory add r7,r17,r7 ;perform add of %edx addi r16,r4,4 ;add 4 to %eax stwz r7,r2,r16 ;store %edx value into memory addi r4,r4,4 ;increment %eax addi r3,r3,9 ;update PC (9 bytes)

  8. Code Discovery • Static Predecoding/Translation • Predecode or binary translate a program in its entirety before beginning emulation • Difficult or impossible in many situations • target of a jump instruction is held in a register • data is interspersed with code Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  9. inst. 2 inst. 1 inst. 3 jump reg. data inst. 5 inst. 6 uncond. branch pad inst. 8 Discovery Problem |mov %ch,0 ?? 31 c0 | 8b |b5 00 00 03 08 8b bd 00 00 03 00 | movl %esi, 0x08030000(%ebp)?? Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study data in instruction stream pad for instruction alignment jump indirect to ???

  10. Code Location • Translated code is accessed with target program counter (TPC), which is different from source program counter (SPC) • Problem when there is an indirect control transfer (branch or jump) • Destination address of the control transfer is held in a register and is a source code address, even though it occurs in the translated code Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study movl %eax,4(%esp) ;load jump address from memory jmp %eax ;jump indirect through %eax addi r16,r11,4 ;compute IA-32 address lwzx r4,r2,r16 ;get IA-32 jump address from IA-32 memory image mtctr r4 ;move to count register (ctr) bctr ;jump indirect through ctr

  11. Solutions for Code-discovery and Code-location • Simple • Instruction sets with fixed-length instructions that are always aligned on fixed boundaries • Source instruction sets are explicitly designed to be emulated • Java bytecodes • do not allow data to be interspersed with code • restrict control flow instructions (branches and jumps) to enable easy code discovery • Sophisticated • Discussed next Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  12. Incremental Predecoding and Translation • General solution to code discovery • Dynamically: translate the binary while the program is operating on actual input data • Incrementally: predecode or translate new sections of code as the program reaches them • High-level control is provided by an emulation manager (EM) • Translated blocks of code are organized as a code cache • A map table associates the SPC for a block of source code with the TPC for the corresponding block of translated code Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  13. Dynamic Translation System Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study SPC to TPC Map Table Interpreter Translator Miss Emulation Manager Hit

  14. Dynamic Translation Flowchart Start with SPC Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study Look Up SPC<->TPC in Map Table No Hit in Table? Use SPC to Read Instructions from Source Memory Image ----------- Interpret, Translate, and Place into Code Cache Yes Branch to TPC and Execute Translated Block Write New SPC<->TPC Mapping into Map Table Get SPC for Next Block

  15. Basic flow of Mimic Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  16. Basic Block • System translates one block of source code at a time • Natural unit is dynamic basic block • Static basic block • Single entry point and single exit point • Dynamic basic block • Begins at the instruction executed immediately after a branch or jump, • Follows sequential instruction stream, • Ends with next branch or jump • Complication when a branch goes into the middle of a block that is already translated • Additional data structure used to keep track of ranges of translated code blocks • When using dynamic basic blocks are used • new translation is always started when there is a miss in the map table, even if it leads to replicated sections of translated code Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  17. Static Versus Dynamic Basic Blocks Dynamic Basic Blocks Static Basic Blocks Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study add… load… store… loop: load…block 1 add… store brcond skip load… sub…block 2 skip: add… store brcond loop loop: load… add… store block 3 brcond skip skip: add… store block 4 brcond loop … … add… load…block 1 store… loop: load… add…block 2 store brcond skip load…block 3 sub… skip: add… store block 4 brcond loop add… load… store…block 5 jump indirect … …

  18. Flow of Control Involving Translated Blocks translated block Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study translated block Emulation Manager translated block • Incrementally, more of the program is discovered and translated, until eventually only translated code is being executed

  19. Tracking the Source Program Code EmulationManager Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study Code Block Jump and Link to EM Next Source PC Map Table Code Block • The SPC must be available to the EM • Can be placed in a “stub” at the end of the translated block • When the translated block finishes, control is transferred back to the EM using a jump-and-link instruction

  20. Same-ISA Emulation • Emulation manager is always in control of the software being emulated • Can identify details concerning operations to be performed by that specific instruction • Monitor the execution of the source program at any desired level of detail Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  21. Control Transfer Optimizations • Every time a translation block finishes execution, EM must be reentered and an SPC-to-TPC lookup must occur • There are a number of optimizations that can reduce this overhead by eliminating the need to go through the EM between every pair of translation blocks Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  22. Translation Chaining • Instead of branching to the EM at the end of every translated block, the blocks can be linked directly to each other • Blocks are translated one at a time, but they are linked together into chains as they are constructed • Linking is accomplished by replacing what was initially a jump and link back to the EM with a direct branch to the successor translation block Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  23. Chaining of Translation Blocks Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study translated block translated block Emulation Manager translated block translated block

  24. Creating a Link Get next SPC Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study (1) (2) Predecessor Look up successor (3) JAL EM Next SPC (4) Jump TPC Set up link (5) Successor

  25. Software Indirect Jump Prediction • Indirect jumps by map table lookup is expensive in execution time • Hash SPC to form index into map table • Load and compare to find matching table entry • Load and indirect jump to TPC address • In many cases, jump target never or very seldom changes • The most frequent SPC addresses and their matching TPC addresses are encoded into the translated binary code Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  26. Shadow Stack • When a translated code block contains a procedure call to a target binary routine, the SPC value must be saved • When the procedure is finished, it restores the SPC value, accesses the map table, and jumps to translated block of code at the return address • To avoid this overhead, make the target return PC value be directly available • Return value of target code is pushed onto a shadow stack maintained by the EM • In case the source code has changed before the return, check the return address of the source binary against the return address saved in the stack Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  27. Shadow Stack Implementation Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study IA-32 stack Shadow stack IA-32 return address Shadow stack frame PPC return address IA-32 return address Return address if compare succeeds IA-32 stack pointer if compare fails, map table must be used IA-32 stack pointer Compare on return

  28. Instruction Set Issues • Details to be taken care of in translating a complete instruction set Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  29. Register Architectures • General-purpose registers of the target ISA are used to • Hold general-purpose registers of the source ISA • Hold special-purpose registers of the source ISA • Point to the source register context block and memory image • Hold intermediate values used by the emulator • Assign target registers to the most common or performance-critical source resources Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  30. Condition Codes • Special architected bits that characterize instruction results • Little uniformity in how they are used across ISAs • Easiest – neither source nor target ISAs use condition codes • Almost as easy – source ISA does not use condition codes but target ISA does • If target ISA has no condition codes, source condition codes must be emulated • Time consuming Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  31. Condition Codes • Although condition codes are set frequently, they are seldom used • Lazy evaluation can be performed to make condition code emulation more efficient • Operands and operation that set the condition code are saved rather than the condition code itself Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  32. Condition Codes r4 <-> %eax IA-32 to r5 <-> %ebx PowerPC r6 <-> %ecx register mappings … r16 <-> scratch register used by emulation code r25 <-> condition code operand 1 ;registers r26 <-> condition code operand 2 ; used for r27 <-> condition code operation ; lazy condition code emulation r28 <-> jump table base operation lwz r16,0(r4) ;perform memory load for addl mr r25,r16 ;save operands mr r26,r5 ; and opcode for li r27,“addi” ; lazy condition code emulation add r5,r5,r16 ;finish addl mr r25,r6 ;save operands mr r26,r5 ; and opcode for li r27,“add” ; lazy condition code emulation add r6,r6,r5 ;translation of add b label1 … b1 genZF ;branch and link to evaluate genZF code beq cr0,target ;branch on condition flag …. add r29,r28,r27 ;add “opcode” to jump table base address mtctr r29 ;copy to counter register bctr ;branch via jump table … … add r24,r25,2r6 ;perform PowerPC add, set cr0 blr ;return addl %ebx,0(%eax) add %ecx,%ebx jmp label1 . . label1: jz target Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study condition codes set by first add are not used label1: genZF: “sub”: “add”:

  33. Data Formats and Arithmetic • Data formats and arithmetic have become standardized over the years • There may be some differences in the way floating-point arithmetic is performed on different implementations • IA-32 uses 80-bit intermediate results, unlike most other ISAs • Leads to different precisions • Source ISA may require functional capability not available in the target ISA • Simple ISA has sufficient primitives to implement the more complex instructions in the other ISA Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  34. Memory Address Resolution • Most ISAs today address memory to the granularity of individual byes • Emulating a byte-resolution ISA on a machine with word resolution • shift out the low-order byte address bits when performing memory access • use the saved byte offset bits to select the specific bytes being accessed Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  35. Memory Data Alignment • Some ISAs align memory data on “natrual” boundaries • word access must be performed with the 2 low-order address bits being 00 • halfword access must have a 0 for the lowest-order bit • If ISA does not support unaligned data, there are supplementary instructions to simplify the process • Otherwise, ISA specifies a trap if an instruction attempts access with an unaligned address Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  36. Byte Order • Some ISAs order bytes within a word so that the most significant byte is byte 0 (big-endian) • Other ISAs order bytes in little-endian order • To emulate one from the other, complement the low-order two-address bits • 00 -> 11, 01 -> 10, … Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  37. Addressing Architecture • Address space sizes of source and target ISAs may be different • Page sizes may be different • Privilege levels may be different • Solution depends on virtual machine architecture Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  38. Shade • Tool developed for high-performance simulation • Translates ISA being simulated into sequences of target instructions • Dynamically cross-compiles executable code for target machine into executable code that runs directly on the host machine • Caches host code for ruse • Integrates simulation and tracing code • Gives analyzer detailed control over what is traced Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  39. Shade Data Structures Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  40. Shade Data Structures • Source program counter (VPC), base of source memory image (VMEM), and base of the TLB are permanently mapped to target register • During execution, source register values (in Virtual State) are temporarily copied to target registers and results are copied back into VS • Translations fill TC linearly as they are produced • TLB is an array of lists of <target, host> address pairs • Lookup algorithm linearly scans the TLB until it has a match or hits the end of the array • Fast, partial TLB lookup • Slower, full TLB lookup • generate new TC and update TLB Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

  41. Translation Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study translation sample application code • Load contents of application registers r1 and r2 from the appication’s virtual state structure into host scratch registers s1 and s2 • Perform add operation • Write result in host scratch register s3 back to virtual state structure location for application register r3 • Update application’s virtual PC

  42. Conclusion • Memory requirements: high • Size of predecoded memory image is proportional to original source memory image • Can be reduced if blocks of translated code are cached • Start-up performance: very slow • Source memory image must be interpreted to discover control flow before being translated • Steady-state performance: fast • Translated binaries execute directly on hardware • Translated blocks can be linked directly together • Code portability: poor • Code is translated for a specific target ISA Binary Translation Code Discovery & Dynamic Translation Control Transfer Optimizations Instruction Set Issues Case Study

More Related