codesigned virtual machines part ii
Skip this Video
Download Presentation
Codesigned Virtual Machines Part <II>

Loading in 2 Seconds...

play fullscreen
1 / 47

Codesigned Virtual Machines Part <II> - PowerPoint PPT Presentation

  • Uploaded on

Codesigned Virtual Machines Part <II>. 2006. 10. 18 Yu, Young Jin DCSLAB. Contents. Introduction Case Study (1) Transmeta Crusoe Case Study (2) IBM AS/400. Applying Codesigned VMs. Advantages(performance, power efficiency, flexibility) can be achieved,

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Codesigned Virtual Machines Part <II>' - annick

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
codesigned virtual machines part ii

Codesigned Virtual MachinesPart

2006. 10. 18

Yu, Young Jin


  • Introduction
  • Case Study (1)
    • Transmeta Crusoe
  • Case Study (2)
    • IBM AS/400
applying codesigned vms
Applying Codesigned VMs
  • Advantages(performance, power efficiency, flexibility) can be achieved,
    • At the macro level: entirely new ISAs
      • VLIW: Transmeta Crusoe, IBM Daisy/BOA
      • OO source ISA: IBM AS/400
    • At the micro level
      • The implementation of specific performance enhancement
      • Instructions reordering, …
  • In Jan. of 2000, Transmeta Corp. introduced the Crusoe processors.
    • Remarkably low power consumption
  • As might not be expected, The new technology is fundamentally software-based.
    • The power savings come from replacing large numbers of transistors with software.
the crusoe processor
The Crusoe Processor
  • Consists of a hardware engine logically surrounded by a software layer.
    • H/W: The engine
      • is a VLIW CPU capable of executing up to four operations in each clock cycle.
      • No resemblance to the x86 instruction set.
    • S/W: Code Morphing Software(CMS)
      • Dynamically “morphs” x86 instructions into VLIW instructions
the crusoe processor2
The Crusoe Processor
  • CMS technology changes the entire approach to designing microprocessors.
    • Demonstrate practical microprocessors can be implemented as HW-SW hybrids.
    • Expanded the design space
    • Development teams may enlist software experts, working in parallel with hardware engineers to bring products to market faster.
technology perspective
Technology Perspective
  • Decoupled the x86 ISA from the underlying processor hardware.
    • Each new CPU design only requires a new version of the Code Morphing software to translate x86 instructions to the new CPU’s native instruction set.
  • Because the CMS would typically reside in standard Flash ROMs on the motherboard, improved versions can even be downloaded into processor in the field.
crusoe processor fundamentals
Crusoe Processor Fundamentals
  • VLIW engine
    • Two integer units, a floating point unit, a memory(store/load) unit, a branch unit
    • Molecule: a long(64 or 128bits) instruction word contain up to four RISC-like instructions, called atom.
    • All atoms within a molecule are executed in parallel, and the molecule format directly determines how atoms get routed to functional units.
      • This greatly simplifies the decode and dispatch hardware.
crusoe processor fundamentals1
Crusoe Processor Fundamentals
  • The integer register file
    • Has 64 registers, %r0 through %r63
    • CMS allocates some registers to hold x86 state while others contain state internal to the system, or can be used as temporary registers.
crusoe processor fundamentals2
Crusoe Processor Fundamentals
  • To keep the processor running at full speed, molecules are packed as fully as possible with atoms.
conventional superscalar
Conventional superscalar…
  • This type of processor hardware is much morecomplex than the Crusoe processor’s simple VLIW engine.
code morphing software
Code Morphing Software
  • CMS
    • Is fundamentally a dynamic translation system
    • In this case, x86 ISA -> VLIW ISA
    • “x86 ISA” is the only thing x86 code sees.
      • The only program written directly for the VLIW engine is the Code Morphing Software itself.
cms drawing the hw sw line
CMS: Drawing the HW-SW line
  • Choosing which functions to implement in HW and which in SW is a major engineering challenge
    • Involving issues such as cost and complexity, overall performance and power consumption
    • For example, The HW-SW line might be drawn differently for a high-end server processor.
cms decoding and scheduling
CMS: Decoding and Scheduling
  • Code Morphing can translate an entire group of x86 instructions at once,
    • Whereas a superscalar x86 translates single instructions in isolation.
  • The Code Morphing approach can amortize the cost of translation over many executions.
    • Allowing it to use much more sophisticated translation and scheduling algorithm.
cms caching
CMS: Caching
  • The translation cache resides in a separate memory space that is inaccessible to x86 code.
  • As an application executes,
    • Code Morphing “learns” more about the program and improves it so will execute faster and faster.
  • Some benchmarks do not accurately predict the performance of Crusoe processor!!
cms filtering
CMS: Filtering
  • The translation system needs to
    • Choose carefully how much effort to spend on translating and optimizing a given piece of x86 code.
  • A wide choice of execution modes
    • Interpretation only(no translation)
    • Simple-mined code generation
    • Highly-optimized code generation
cms prediction and path selection
CMS: Prediction and Path Selection
  • CMS can gather feedback
    • Instrumentation profiling
      • The translator adds code to collect info.
    • This data can be used later to decide when and what to optimize and translate.
      • For example, if a given branch is highly biased,…
cms making a translation
CMS: Making a Translation

Front end




The molecules explicitly encode the instruction-level parallelism, hence they can be executed by a simple VLIW engine.

hw support for code morphing
HW Support for Code Morphing
  • Exceptions
  • “precise exception” problem


“too soon”

* Solution: Use Shadow Register !

hw support for code morphing1
HW Support for Code Morphing
  • All registers holding x86 state are shadowed. (working/shadow copy)
    • Normal atoms only update the working copy of the register.
    • “commit” operation: working -> shadow regs.
    • “rollback” operation: shadow -> working regs.
  • Undoing changes to memory
    • Holding store data in a “gated store buffer”
    • Commit / rollback
hw support for code morphing2
HW Support for Code Morphing
  • Alias Hardware
    • When the translator moves a load operation ahead of a store operation,
    • it converts the load into a load-and-protect and the store into a store-under-alias-mask.
    • Always safe to reorder memory ld/stores.
hw support for code morphing3
HW Support for Code Morphing
  • Alias Hardware

St 0(r1), r2

Ld r3, 0(r4)

St 0(r5), r6

Ld r7, 0(r8)

Add r9, r3, r7

- Unsafe

Ld r3, 0(r4)

Ld r7, 0(r8)

St 0(r1), r2

St 0(r5), r6

Add r9, r3, r7

- Protected

Ldp r3, 0(r4) x

Ldp r7, 0(r8) x x

Stam 0(r1), r2

Stam 0(r5), r6

Add r9, r3, r7

* The ldp/stam pair is an excellent example that illustrates the interplay between the codesigned hardware and software in a codesigned VM.

hw support for code morphing4
HW Support for Code Morphing
  • Coping with Self-Modifying Code
    • X86 inst. in memory get overwritten, either
      • Because OS is loading a new program, or
      • Because an application is using self-modifying code.
    • When this happens to code that has already been translated,
      • The CMS needs to be notified to keep it from erroneously executing a translation for the old code.
hw support for code morphing5
HW Support for Code Morphing
  • Coping with Self-Modifying Code
    • Whenever the system translates a block of x86 code, it write-protects the page.
      • It does so by setting a dedicated “translated” bit in that page’s entry in the processor’s memory management unit.
      • That bit is invisible to x86 software.
    • When a protected page is written to, the simplest remedy is to invalidate the affected translations.
from ibm s homepage
From IBM’s homepage…
  • The accelerating rate of change of both hardware and software technologies necessitates that the system you select has been designed with the future in mind.
    • “We believe that the IBM AS/400 will be the number one choice !”
  • The design of AS/400 insulates app programs from changing hw characteristics through the layer of microcode.
    • The interface: TIMI
    • The microcode layer: LIC
  • In 1995, AS/400 changed its processor technology ( CISC -> 64bit RISC )
    • No recompiling/rewriting
    • Not only did they run, but they were fully 64-bit programs.
as 400 architecture
AS/400 architecture

TIMI layer separates the hw and LIC from OS

Instructions are translated to a specific hw instruction set as part of the backend of the compilation process.

as 400 architecture1
AS/400 architecture
  • TIMI is a virtual instruction set.
    • All user-mode programs are stored as TIMI instructions.
    • Conceptually somewhat similar to the VM architecture of programming env such as Smalltalk, Java and .NET
    • Stored within the final program object
    • Object-based ISA
memory architecture
Memory Architecture
  • The TIMI has a memory architecture composed of objects.
    • The objects are completely isolated from one another and can only be accessed via pointers.
    • Actual address values contained in pointers are not made visible to SW above TIMI.
    • The implementation of the object-based memory is done entirely below the TIMI.
memory architecture1
Memory Architecture
  • Protecting the integrity of pointers is an essential part of any Object-Based system.
    • The object pointers are encoded in 128bits.
      • Upper 64 bits: type info, authorization, …
      • Lower 64 bits: 64-bit PowerPC virtual addr.
    • Significant extension to PowerPC mem.arch.
      • Adding of protection for object pointers
        • Load/Store-pointer instruction.
        • 65th bit for indicating whether the location contains a pointer
instruction set
2 bytes 2 bytes 3 bytes 3 bytes 3 bytes 3 bytes

(optional) (optional) (optional) (optional) (optional)

Instruction Set
  • TIMI instruction format
  • Multiway conditional branch
    • This is the “architected representation”
    • It is translated to an impl-dependent form, and it does the work of multiple RISC instructions.
instruction set1
Instruction Set

1 31 32 33 34 35 36 37

ODT Direction


ODT Entry


  • Add numeric and multiply numeric, are generic
  • Entries in the ODT indicate the types of operands and the data flow.
  • The actual storage locations: after the TIMI is translated
input output
  • The presence of IOPs simplifies the task of pushing the device-dependent aspects out of the central processor.
input output1
  • At the level of TIMI,
    • There is no secondary(disk) storage; rather it is part of the unified mem architecture.
      • All disk management SW, drivers, etc. exist in the impl-dependent part of the system.
  • The OS interacts with SW below the TIMI level(and with I/O devices)
    • through instructions that operate on the TIMI-level objects.
input output2
  • TIMI-Supported Objects
    • Access group, Context, …
    • Authorization List, User Profile, …
    • Dictionary, Index, …
    • Queue, Mode descriptor, …
    • Logical unit descriptor, …
    • Module, Program, …
code translation concealment
Code Translation & Concealment
  • HLL -> Template(TIMI + ODT) -> Program Object
  • The contents of the program object cannot be directly observed above the TIMI level.
  • Materialization
    • Giving back to the user in the original, machine-independent form
    • The platform switch is transparent to the user.
code translation concealment1
Space object

Progm. object




Space object

Program Object