Transmeta and Dynamic Code Optimization

Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz

Stuff Compilers Don’t (Can’t?) Do • Instruction reordering • Common case detection and optimization • Branch prediction • Traces ( pre-fetching ) • Optimizing traces • Why can’t compilers do these optimizations? • No runtime statistics • Legacy code ( inertia to recompile )

Therefore – Dynamic Code Optimization • Optimize on the fly ( runtime ) • Current processors do it to some extent • Instruction reordering • Branch prediction • You can do much better…

How Do You Implement This? • “Hardware Intensive” approach • Pentium Pro • Instruction Translator – Part of the critical path of the main processor • I-COP • Instruction-block Optimizer – Off the critical path • “Non-Hardware Intensive” approach • Transmeta, DAISY, Java HotSpot • Trade-offs ?

I-COP (Instruction Path Coprocessors) • What? • Add another processor that watches the instructions retire and can perform operations on them • Why? • Performance! • Principles • Keep the optimizations out of the critical path • Avoid slowdown due to software

Structure • Multiple VLIW processor “slices” makes the I-COP simple, but still able to keep up • I-COP slices have 10 special instructions for pattern matching in addition to 12 normal RISC type

Applications of I-COP • Trace cache fill • Find long strings of instructions that are executed frequently • Pre-fetching • Find a load that is used later as an address in another load • Instruction trace optimizations • Register move optimization

The I-COP Processor • Multiple VLIW slices allow multi-level statically scheduled and explicitly encoded parallelism • Predication and delay slots obviate branch prediction • 32 integer registers, 8 predicate registers • 22 instructions, 12 RISC type, and 10 special • Pattern matching, bit manipulation, instrumentation • Fill buffer collects instructions for analysis • Task queue acts as FIFO scheduler

The I-COP Processor Cont.

Examples Of Special Instructions • SearchReplace • Finds a given pattern and replaces it with another given pattern, returns the number of replacements accomplished • Subset • Tests if the bits set in a given register are a subset of those set in a second register

Transmeta Crusoe • The best example of a “non-hardware-intensive” approach • New (and fast!) 128-bit VLIW processor • Aimed at systems where power efficiency is important • Mobile systems • “Dense” servers • Therefore, small gate count • BUT, need x86 compatibility • AND, at reasonable performance too

So how do they do it? • Have a “Code-Morphing” software layer that runs on the processor • All x86 software (BIOS, OS, apps) runs above this • CM software translates x86 code at runtime into VLIW processor’s native IS • Also optimizes the translations! • So processor is fast and simple

Cheesy Marketing Image

Code-Morphing Software • Translates an entire basic-block at once • Also does instruction re-ordering, branch prediction, register renaming • The translations are stored in a translation cache (part of main memory) • Instruments code to help with branch prediction, and detecting candidates for heavy optimizations

Code Morphing Software (cont.) • Also has some help from the hardware • Shadowed and working register sets • Alias hardware (load-and-protect operations) • “Translated” bit for each page table entry • Performance of systems with Crusoe: 2-3 times longer battery life, performance “comparable” to Intel mobile processors

Transmeta and Dynamic Code Optimization

Transmeta and Dynamic Code Optimization

Presentation Transcript

Transmeta Crusoe

Code Optimization

Code Optimization and Performance

Code Optimization

Code Tuning and Optimization

Code Tuning and Optimization

Dynamic Compilation and Optimization

Dynamic Binary Optimization

Code Optimization

Dynamic Binary Optimization

Dynamic Optimization

Code Optimization

Dynamic Optimization and Automatic Differentiation

Final Code Generation and Code Optimization

Code Optimization

Dynamic Query Optimization

Code Optimization

Dynamic Route Optimization

Dynamic Optimization and Automatic Differentiation

Code Optimization

Code Optimization