html5
1 / 15

Transmeta and Dynamic Code Optimization

Transmeta and Dynamic Code Optimization. Ashwin Bharambe Mahim Mishra Matthew Rosencrantz. Stuff Compilers Don’t (Can’t?) Do. Instruction reordering Common case detection and optimization Branch prediction Traces ( pre-fetching ) Optimizing traces

bette
Download Presentation

Transmeta and Dynamic Code Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz

  2. Stuff Compilers Don’t (Can’t?) Do • Instruction reordering • Common case detection and optimization • Branch prediction • Traces ( pre-fetching ) • Optimizing traces • Why can’t compilers do these optimizations? • No runtime statistics • Legacy code ( inertia to recompile )

  3. Therefore – Dynamic Code Optimization • Optimize on the fly ( runtime ) • Current processors do it to some extent • Instruction reordering • Branch prediction • You can do much better…

  4. How Do You Implement This? • “Hardware Intensive” approach • Pentium Pro • Instruction Translator – Part of the critical path of the main processor • I-COP • Instruction-block Optimizer – Off the critical path • “Non-Hardware Intensive” approach • Transmeta, DAISY, Java HotSpot • Trade-offs ?

  5. I-COP (Instruction Path Coprocessors) • What? • Add another processor that watches the instructions retire and can perform operations on them • Why? • Performance! • Principles • Keep the optimizations out of the critical path • Avoid slowdown due to software

  6. Structure • Multiple VLIW processor “slices” makes the I-COP simple, but still able to keep up • I-COP slices have 10 special instructions for pattern matching in addition to 12 normal RISC type

  7. Applications of I-COP • Trace cache fill • Find long strings of instructions that are executed frequently • Pre-fetching • Find a load that is used later as an address in another load • Instruction trace optimizations • Register move optimization

  8. The I-COP Processor • Multiple VLIW slices allow multi-level statically scheduled and explicitly encoded parallelism • Predication and delay slots obviate branch prediction • 32 integer registers, 8 predicate registers • 22 instructions, 12 RISC type, and 10 special • Pattern matching, bit manipulation, instrumentation • Fill buffer collects instructions for analysis • Task queue acts as FIFO scheduler

  9. The I-COP Processor Cont.

  10. Examples Of Special Instructions • SearchReplace • Finds a given pattern and replaces it with another given pattern, returns the number of replacements accomplished • Subset • Tests if the bits set in a given register are a subset of those set in a second register

  11. Transmeta Crusoe • The best example of a “non-hardware-intensive” approach • New (and fast!) 128-bit VLIW processor • Aimed at systems where power efficiency is important • Mobile systems • “Dense” servers • Therefore, small gate count • BUT, need x86 compatibility • AND, at reasonable performance too

  12. So how do they do it? • Have a “Code-Morphing” software layer that runs on the processor • All x86 software (BIOS, OS, apps) runs above this • CM software translates x86 code at runtime into VLIW processor’s native IS • Also optimizes the translations! • So processor is fast and simple

  13. Cheesy Marketing Image

  14. Code-Morphing Software • Translates an entire basic-block at once • Also does instruction re-ordering, branch prediction, register renaming • The translations are stored in a translation cache (part of main memory) • Instruments code to help with branch prediction, and detecting candidates for heavy optimizations

  15. Code Morphing Software (cont.) • Also has some help from the hardware • Shadowed and working register sets • Alias hardware (load-and-protect operations) • “Translated” bit for each page table entry • Performance of systems with Crusoe: 2-3 times longer battery life, performance “comparable” to Intel mobile processors

More Related