1 / 47

Codesigned Virtual Machines Part <II>

Codesigned Virtual Machines Part <II>. 2006. 10. 18 Yu, Young Jin DCSLAB. Contents. Introduction Case Study (1) Transmeta Crusoe Case Study (2) IBM AS/400. Applying Codesigned VMs. Advantages(performance, power efficiency, flexibility) can be achieved,

annick
Download Presentation

Codesigned Virtual Machines Part <II>

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Codesigned Virtual MachinesPart <II> 2006. 10. 18 Yu, Young Jin DCSLAB

  2. Contents • Introduction • Case Study (1) • Transmeta Crusoe • Case Study (2) • IBM AS/400

  3. Applying Codesigned VMs • Advantages(performance, power efficiency, flexibility) can be achieved, • At the macro level: entirely new ISAs • VLIW: Transmeta Crusoe, IBM Daisy/BOA • OO source ISA: IBM AS/400 • At the micro level • The implementation of specific performance enhancement • Instructions reordering, …

  4. Case Study (1):Transmeta Crusoe

  5. Introduction • In Jan. of 2000, Transmeta Corp. introduced the Crusoe processors. • Remarkably low power consumption • As might not be expected, The new technology is fundamentally software-based. • The power savings come from replacing large numbers of transistors with software.

  6. The Crusoe Processor • Consists of a hardware engine logically surrounded by a software layer. • H/W: The engine • is a VLIW CPU capable of executing up to four operations in each clock cycle. • No resemblance to the x86 instruction set. • S/W: Code Morphing Software(CMS) • Dynamically “morphs” x86 instructions into VLIW instructions

  7. The Crusoe Processor

  8. The Crusoe Processor • CMS technology changes the entire approach to designing microprocessors. • Demonstrate practical microprocessors can be implemented as HW-SW hybrids. • Expanded the design space • Development teams may enlist software experts, working in parallel with hardware engineers to bring products to market faster.

  9. Technology Perspective • Decoupled the x86 ISA from the underlying processor hardware. • Each new CPU design only requires a new version of the Code Morphing software to translate x86 instructions to the new CPU’s native instruction set. • Because the CMS would typically reside in standard Flash ROMs on the motherboard, improved versions can even be downloaded into processor in the field.

  10. x86 vs. Crusoe

  11. Crusoe Processor Fundamentals • VLIW engine • Two integer units, a floating point unit, a memory(store/load) unit, a branch unit • Molecule: a long(64 or 128bits) instruction word contain up to four RISC-like instructions, called atom. • All atoms within a molecule are executed in parallel, and the molecule format directly determines how atoms get routed to functional units. • This greatly simplifies the decode and dispatch hardware.

  12. Crusoe Processor Fundamentals • The integer register file • Has 64 registers, %r0 through %r63 • CMS allocates some registers to hold x86 state while others contain state internal to the system, or can be used as temporary registers.

  13. Crusoe Processor Fundamentals • To keep the processor running at full speed, molecules are packed as fully as possible with atoms.

  14. Conventional superscalar… • This type of processor hardware is much morecomplex than the Crusoe processor’s simple VLIW engine.

  15. Code Morphing Software • CMS • Is fundamentally a dynamic translation system • In this case, x86 ISA -> VLIW ISA • “x86 ISA” is the only thing x86 code sees. • The only program written directly for the VLIW engine is the Code Morphing Software itself.

  16. Hierarchy

  17. Hierarchy

  18. Crusoe’s VLIW instr. Scheduling

  19. Code Morphing Software

  20. CMS Memory Layout

  21. CMS: Drawing the HW-SW line • Choosing which functions to implement in HW and which in SW is a major engineering challenge • Involving issues such as cost and complexity, overall performance and power consumption • For example, The HW-SW line might be drawn differently for a high-end server processor.

  22. CMS: Decoding and Scheduling • Code Morphing can translate an entire group of x86 instructions at once, • Whereas a superscalar x86 translates single instructions in isolation. • The Code Morphing approach can amortize the cost of translation over many executions. • Allowing it to use much more sophisticated translation and scheduling algorithm.

  23. CMS: Caching • The translation cache resides in a separate memory space that is inaccessible to x86 code. • As an application executes, • Code Morphing “learns” more about the program and improves it so will execute faster and faster. • Some benchmarks do not accurately predict the performance of Crusoe processor!!

  24. CMS: Filtering • The translation system needs to • Choose carefully how much effort to spend on translating and optimizing a given piece of x86 code. • A wide choice of execution modes • Interpretation only(no translation) • Simple-mined code generation • Highly-optimized code generation

  25. CMS: Prediction and Path Selection • CMS can gather feedback • Instrumentation profiling • The translator adds code to collect info. • This data can be used later to decide when and what to optimize and translate. • For example, if a given branch is highly biased,…

  26. CMS: Making a Translation Front end Well-known optimizations Scheduling The molecules explicitly encode the instruction-level parallelism, hence they can be executed by a simple VLIW engine.

  27. HW Support for Code Morphing • Exceptions • “precise exception” problem trap “too soon” * Solution: Use Shadow Register !

  28. HW Support for Code Morphing • All registers holding x86 state are shadowed. (working/shadow copy) • Normal atoms only update the working copy of the register. • “commit” operation: working -> shadow regs. • “rollback” operation: shadow -> working regs. • Undoing changes to memory • Holding store data in a “gated store buffer” • Commit / rollback

  29. HW Support for Code Morphing • Alias Hardware • When the translator moves a load operation ahead of a store operation, • it converts the load into a load-and-protect and the store into a store-under-alias-mask. • Always safe to reorder memory ld/stores.

  30. HW Support for Code Morphing • Alias Hardware <Original Code> St 0(r1), r2 … Ld r3, 0(r4) … St 0(r5), r6 … Ld r7, 0(r8) Add r9, r3, r7 <Rescheduled Code> - Unsafe Ld r3, 0(r4) Ld r7, 0(r8) St 0(r1), r2 … … St 0(r5), r6 … Add r9, r3, r7 <Rescheduled Code> - Protected Ldp r3, 0(r4) x Ldp r7, 0(r8) x x Stam 0(r1), r2 … … Stam 0(r5), r6 … Add r9, r3, r7 * The ldp/stam pair is an excellent example that illustrates the interplay between the codesigned hardware and software in a codesigned VM.

  31. HW Support for Code Morphing • Coping with Self-Modifying Code • X86 inst. in memory get overwritten, either • Because OS is loading a new program, or • Because an application is using self-modifying code. • When this happens to code that has already been translated, • The CMS needs to be notified to keep it from erroneously executing a translation for the old code.

  32. HW Support for Code Morphing • Coping with Self-Modifying Code • Whenever the system translates a block of x86 code, it write-protects the page. • It does so by setting a dedicated “translated” bit in that page’s entry in the processor’s memory management unit. • That bit is invisible to x86 software. • When a protected page is written to, the simplest remedy is to invalidate the affected translations.

  33. Example: A complex translation

  34. Case Study (2):IBM AS/400

  35. From IBM’s homepage… • The accelerating rate of change of both hardware and software technologies necessitates that the system you select has been designed with the future in mind. • “We believe that the IBM AS/400 will be the number one choice !”

  36. Introduction • The design of AS/400 insulates app programs from changing hw characteristics through the layer of microcode. • The interface: TIMI • The microcode layer: LIC • In 1995, AS/400 changed its processor technology ( CISC -> 64bit RISC ) • No recompiling/rewriting • Not only did they run, but they were fully 64-bit programs.

  37. AS/400 architecture TIMI layer separates the hw and LIC from OS Instructions are translated to a specific hw instruction set as part of the backend of the compilation process.

  38. AS/400 architecture • TIMI is a virtual instruction set. • All user-mode programs are stored as TIMI instructions. • Conceptually somewhat similar to the VM architecture of programming env such as Smalltalk, Java and .NET • Stored within the final program object • Object-based ISA

  39. Memory Architecture • The TIMI has a memory architecture composed of objects. • The objects are completely isolated from one another and can only be accessed via pointers. • Actual address values contained in pointers are not made visible to SW above TIMI. • The implementation of the object-based memory is done entirely below the TIMI.

  40. Memory Architecture • Protecting the integrity of pointers is an essential part of any Object-Based system. • The object pointers are encoded in 128bits. • Upper 64 bits: type info, authorization, … • Lower 64 bits: 64-bit PowerPC virtual addr. • Significant extension to PowerPC mem.arch. • Adding of protection for object pointers • Load/Store-pointer instruction. • 65th bit for indicating whether the location contains a pointer

  41. 2 bytes 2 bytes 3 bytes 3 bytes 3 bytes 3 bytes (optional) (optional) (optional) (optional) (optional) Instruction Set • TIMI instruction format • Multiway conditional branch • This is the “architected representation” • It is translated to an impl-dependent form, and it does the work of multiple RISC instructions.

  42. Instruction Set 1 31 32 33 34 35 36 37 ODT Direction Vector ODT Entry String • Add numeric and multiply numeric, are generic • Entries in the ODT indicate the types of operands and the data flow. • The actual storage locations: after the TIMI is translated

  43. Input/Output • The presence of IOPs simplifies the task of pushing the device-dependent aspects out of the central processor.

  44. Input/Output • At the level of TIMI, • There is no secondary(disk) storage; rather it is part of the unified mem architecture. • All disk management SW, drivers, etc. exist in the impl-dependent part of the system. • The OS interacts with SW below the TIMI level(and with I/O devices) • through instructions that operate on the TIMI-level objects.

  45. Input/Output • TIMI-Supported Objects • Access group, Context, … • Authorization List, User Profile, … • Dictionary, Index, … • Queue, Mode descriptor, … • Logical unit descriptor, … • Module, Program, …

  46. Code Translation & Concealment • HLL -> Template(TIMI + ODT) -> Program Object • The contents of the program object cannot be directly observed above the TIMI level. • Materialization • Giving back to the user in the original, machine-independent form • The platform switch is transparent to the user.

  47. Space object Progm. object HLL Program Compiler Space object Program Object <template> TIMI, ODT Impl-dependent Executable code <template> TIMI, ODT Code Translation & Concealment TIMI Level Translator

More Related