Codesigned virtual machines part ii
Download
1 / 47

Codesigned Virtual Machines Part <II> - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

Codesigned Virtual Machines Part <II>. 2006. 10. 18 Yu, Young Jin DCSLAB. Contents. Introduction Case Study (1) Transmeta Crusoe Case Study (2) IBM AS/400. Applying Codesigned VMs. Advantages(performance, power efficiency, flexibility) can be achieved,

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Codesigned Virtual Machines Part <II>' - annick


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Codesigned virtual machines part ii

Codesigned Virtual MachinesPart <II>

2006. 10. 18

Yu, Young Jin

DCSLAB


Contents
Contents

  • Introduction

  • Case Study (1)

    • Transmeta Crusoe

  • Case Study (2)

    • IBM AS/400


Applying codesigned vms
Applying Codesigned VMs

  • Advantages(performance, power efficiency, flexibility) can be achieved,

    • At the macro level: entirely new ISAs

      • VLIW: Transmeta Crusoe, IBM Daisy/BOA

      • OO source ISA: IBM AS/400

    • At the micro level

      • The implementation of specific performance enhancement

      • Instructions reordering, …


Case study 1 transmeta crusoe

Case Study (1):Transmeta Crusoe


Introduction
Introduction

  • In Jan. of 2000, Transmeta Corp. introduced the Crusoe processors.

    • Remarkably low power consumption

  • As might not be expected, The new technology is fundamentally software-based.

    • The power savings come from replacing large numbers of transistors with software.


The crusoe processor
The Crusoe Processor

  • Consists of a hardware engine logically surrounded by a software layer.

    • H/W: The engine

      • is a VLIW CPU capable of executing up to four operations in each clock cycle.

      • No resemblance to the x86 instruction set.

    • S/W: Code Morphing Software(CMS)

      • Dynamically “morphs” x86 instructions into VLIW instructions



The crusoe processor2
The Crusoe Processor

  • CMS technology changes the entire approach to designing microprocessors.

    • Demonstrate practical microprocessors can be implemented as HW-SW hybrids.

    • Expanded the design space

    • Development teams may enlist software experts, working in parallel with hardware engineers to bring products to market faster.


Technology perspective
Technology Perspective

  • Decoupled the x86 ISA from the underlying processor hardware.

    • Each new CPU design only requires a new version of the Code Morphing software to translate x86 instructions to the new CPU’s native instruction set.

  • Because the CMS would typically reside in standard Flash ROMs on the motherboard, improved versions can even be downloaded into processor in the field.



Crusoe processor fundamentals
Crusoe Processor Fundamentals

  • VLIW engine

    • Two integer units, a floating point unit, a memory(store/load) unit, a branch unit

    • Molecule: a long(64 or 128bits) instruction word contain up to four RISC-like instructions, called atom.

    • All atoms within a molecule are executed in parallel, and the molecule format directly determines how atoms get routed to functional units.

      • This greatly simplifies the decode and dispatch hardware.


Crusoe processor fundamentals1
Crusoe Processor Fundamentals

  • The integer register file

    • Has 64 registers, %r0 through %r63

    • CMS allocates some registers to hold x86 state while others contain state internal to the system, or can be used as temporary registers.


Crusoe processor fundamentals2
Crusoe Processor Fundamentals

  • To keep the processor running at full speed, molecules are packed as fully as possible with atoms.


Conventional superscalar
Conventional superscalar…

  • This type of processor hardware is much morecomplex than the Crusoe processor’s simple VLIW engine.


Code morphing software
Code Morphing Software

  • CMS

    • Is fundamentally a dynamic translation system

    • In this case, x86 ISA -> VLIW ISA

    • “x86 ISA” is the only thing x86 code sees.

      • The only program written directly for the VLIW engine is the Code Morphing Software itself.







Cms drawing the hw sw line
CMS: Drawing the HW-SW line

  • Choosing which functions to implement in HW and which in SW is a major engineering challenge

    • Involving issues such as cost and complexity, overall performance and power consumption

    • For example, The HW-SW line might be drawn differently for a high-end server processor.


Cms decoding and scheduling
CMS: Decoding and Scheduling

  • Code Morphing can translate an entire group of x86 instructions at once,

    • Whereas a superscalar x86 translates single instructions in isolation.

  • The Code Morphing approach can amortize the cost of translation over many executions.

    • Allowing it to use much more sophisticated translation and scheduling algorithm.


Cms caching
CMS: Caching

  • The translation cache resides in a separate memory space that is inaccessible to x86 code.

  • As an application executes,

    • Code Morphing “learns” more about the program and improves it so will execute faster and faster.

  • Some benchmarks do not accurately predict the performance of Crusoe processor!!


Cms filtering
CMS: Filtering

  • The translation system needs to

    • Choose carefully how much effort to spend on translating and optimizing a given piece of x86 code.

  • A wide choice of execution modes

    • Interpretation only(no translation)

    • Simple-mined code generation

    • Highly-optimized code generation


Cms prediction and path selection
CMS: Prediction and Path Selection

  • CMS can gather feedback

    • Instrumentation profiling

      • The translator adds code to collect info.

    • This data can be used later to decide when and what to optimize and translate.

      • For example, if a given branch is highly biased,…


Cms making a translation
CMS: Making a Translation

Front end

Well-known

optimizations

Scheduling

The molecules explicitly encode the instruction-level parallelism, hence they can be executed by a simple VLIW engine.


Hw support for code morphing
HW Support for Code Morphing

  • Exceptions

  • “precise exception” problem

trap

“too soon”

* Solution: Use Shadow Register !


Hw support for code morphing1
HW Support for Code Morphing

  • All registers holding x86 state are shadowed. (working/shadow copy)

    • Normal atoms only update the working copy of the register.

    • “commit” operation: working -> shadow regs.

    • “rollback” operation: shadow -> working regs.

  • Undoing changes to memory

    • Holding store data in a “gated store buffer”

    • Commit / rollback


Hw support for code morphing2
HW Support for Code Morphing

  • Alias Hardware

    • When the translator moves a load operation ahead of a store operation,

    • it converts the load into a load-and-protect and the store into a store-under-alias-mask.

    • Always safe to reorder memory ld/stores.


Hw support for code morphing3
HW Support for Code Morphing

  • Alias Hardware

<Original Code>

St 0(r1), r2

Ld r3, 0(r4)

St 0(r5), r6

Ld r7, 0(r8)

Add r9, r3, r7

<Rescheduled Code>

- Unsafe

Ld r3, 0(r4)

Ld r7, 0(r8)

St 0(r1), r2

St 0(r5), r6

Add r9, r3, r7

<Rescheduled Code>

- Protected

Ldp r3, 0(r4) x

Ldp r7, 0(r8) x x

Stam 0(r1), r2

Stam 0(r5), r6

Add r9, r3, r7

* The ldp/stam pair is an excellent example that illustrates the interplay between the codesigned hardware and software in a codesigned VM.


Hw support for code morphing4
HW Support for Code Morphing

  • Coping with Self-Modifying Code

    • X86 inst. in memory get overwritten, either

      • Because OS is loading a new program, or

      • Because an application is using self-modifying code.

    • When this happens to code that has already been translated,

      • The CMS needs to be notified to keep it from erroneously executing a translation for the old code.


Hw support for code morphing5
HW Support for Code Morphing

  • Coping with Self-Modifying Code

    • Whenever the system translates a block of x86 code, it write-protects the page.

      • It does so by setting a dedicated “translated” bit in that page’s entry in the processor’s memory management unit.

      • That bit is invisible to x86 software.

    • When a protected page is written to, the simplest remedy is to invalidate the affected translations.



Case study 2 ibm as 400

Case Study (2):IBM AS/400


From ibm s homepage
From IBM’s homepage…

  • The accelerating rate of change of both hardware and software technologies necessitates that the system you select has been designed with the future in mind.

    • “We believe that the IBM AS/400 will be the number one choice !”


Introduction1
Introduction

  • The design of AS/400 insulates app programs from changing hw characteristics through the layer of microcode.

    • The interface: TIMI

    • The microcode layer: LIC

  • In 1995, AS/400 changed its processor technology ( CISC -> 64bit RISC )

    • No recompiling/rewriting

    • Not only did they run, but they were fully 64-bit programs.


As 400 architecture
AS/400 architecture

TIMI layer separates the hw and LIC from OS

Instructions are translated to a specific hw instruction set as part of the backend of the compilation process.


As 400 architecture1
AS/400 architecture

  • TIMI is a virtual instruction set.

    • All user-mode programs are stored as TIMI instructions.

    • Conceptually somewhat similar to the VM architecture of programming env such as Smalltalk, Java and .NET

    • Stored within the final program object

    • Object-based ISA


Memory architecture
Memory Architecture

  • The TIMI has a memory architecture composed of objects.

    • The objects are completely isolated from one another and can only be accessed via pointers.

    • Actual address values contained in pointers are not made visible to SW above TIMI.

    • The implementation of the object-based memory is done entirely below the TIMI.


Memory architecture1
Memory Architecture

  • Protecting the integrity of pointers is an essential part of any Object-Based system.

    • The object pointers are encoded in 128bits.

      • Upper 64 bits: type info, authorization, …

      • Lower 64 bits: 64-bit PowerPC virtual addr.

    • Significant extension to PowerPC mem.arch.

      • Adding of protection for object pointers

        • Load/Store-pointer instruction.

        • 65th bit for indicating whether the location contains a pointer


Instruction set

2 bytes 2 bytes 3 bytes 3 bytes 3 bytes 3 bytes

(optional) (optional) (optional) (optional) (optional)

Instruction Set

  • TIMI instruction format

  • Multiway conditional branch

    • This is the “architected representation”

    • It is translated to an impl-dependent form, and it does the work of multiple RISC instructions.


Instruction set1
Instruction Set 3 bytes 3 bytes 3 bytes

1 31 32 33 34 35 36 37

ODT Direction

Vector

ODT Entry

String

  • Add numeric and multiply numeric, are generic

  • Entries in the ODT indicate the types of operands and the data flow.

  • The actual storage locations: after the TIMI is translated


Input output
Input/Output 3 bytes 3 bytes 3 bytes

  • The presence of IOPs simplifies the task of pushing the device-dependent aspects out of the central processor.


Input output1
Input/Output 3 bytes 3 bytes 3 bytes

  • At the level of TIMI,

    • There is no secondary(disk) storage; rather it is part of the unified mem architecture.

      • All disk management SW, drivers, etc. exist in the impl-dependent part of the system.

  • The OS interacts with SW below the TIMI level(and with I/O devices)

    • through instructions that operate on the TIMI-level objects.


Input output2
Input/Output 3 bytes 3 bytes 3 bytes

  • TIMI-Supported Objects

    • Access group, Context, …

    • Authorization List, User Profile, …

    • Dictionary, Index, …

    • Queue, Mode descriptor, …

    • Logical unit descriptor, …

    • Module, Program, …


Code translation concealment
Code Translation & Concealment 3 bytes 3 bytes 3 bytes

  • HLL -> Template(TIMI + ODT) -> Program Object

  • The contents of the program object cannot be directly observed above the TIMI level.

  • Materialization

    • Giving back to the user in the original, machine-independent form

    • The platform switch is transparent to the user.


Code translation concealment1

Space object 3 bytes 3 bytes 3 bytes

Progm. object

HLL

Program

Compiler

Space object

Program Object

<template>

TIMI,

ODT

Impl-dependent

Executable

code

<template>

TIMI,

ODT

Code Translation & Concealment

TIMI Level

Translator


ad