time dilation in ramp n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Time dilation in RAMP PowerPoint Presentation
Download Presentation
Time dilation in RAMP

Loading in 2 Seconds...

play fullscreen
1 / 12

Time dilation in RAMP - PowerPoint PPT Presentation


  • 176 Views
  • Uploaded on

Time dilation in RAMP. Zhangxi Tan and David Patterson Computer Science Division UC Berkeley. A time machine. Using RAMP as datacenter simulator Vary DC configurations: processors, disks, network and etc.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Time dilation in RAMP' - jamar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
time dilation in ramp

Time dilation in RAMP

Zhangxi Tan and David Patterson

Computer Science Division

UC Berkeley

a time machine
A time machine
  • Using RAMP as datacenter simulator
    • Vary DC configurations: processors, disks, network and etc.
        • Evaluate different system implementations: Mapreduce with 10 Gbps, 2ms delay or 100 Gbps, 80ms delay interconnect
        • Explore and predict what happened if update hardware in your cluster: powerful CPU, fast/large disks
        • Try things in the future!

RAMP inside

the problems
The problems
  • Emulate fast and manycomputers in FPGA
  • What are the problems?
    • First comment half year ago in RadLab retreat: 100 MHz is too slow can’t reflect GHz machine
    • Targets are becoming more and more complex
      • Implement them in FPGA and cycle accurate is desired
      • How many cores can we put in FPGA?(Original vision 16-24 cores per chip.Now, 1 Leon on V2P30, 2-3 on V2P70)
methodologies
Methodologies
  • RDL
    • Target cycle, host cycle, start, stop, channel model…
      • Transfer data between units with extra start/stop control
      • Replace original transferring logic with RDL
      • control target clock: If no data, still send something to keep the target time “running”
      • Bad control logic implementation may cause deadlock
      • RDLizing unit (build channels, units) if you want to talk with each other
        • Compared to porting APPs for MicroBlaze?
        • RDLizing is obvious and simple??
      • Model: event driven? or clock driven?
  • Time dilation
    • Remove target cycle control
      • Stepping every clock cycle is the way to debug 1000 nodes system?
    • Use standard data transfer interface
    • Rescale everything to a “virtual wall clock” and “slow down” events accordingly
      • Events: Timer interrupt, data sent/received and etc
basic idea
Basic Idea
  • “Slow down”time passage to make target faster
    • 10 ms wall clock time = 2 ms target time
      • Network: shorter time to send packet -> BW increase, latency decrease
      • Disk: shorter time to read/write
      • CPU: shorter time to do computation
    • Virtual wall clock is the coordinate in target, only control event interval in implementation

No time dilation

10 ms

Wall clock

10 ms perceived event interval

Time dilation

2 ms

Virtual wall clock

2 ms perceived event interval

10 ms perceived event interval

real world examples
Real world examples
  • Sending data at the same rate with the same logic

Network

Sending 100 Mb data between two events

1 sec

Real

Perceived BW : 100 Mbps

100 ms

Time dilation

Perceived BW : 1 Gbps

CPU and OS

  • OS updates its timer every 10 ms (jiffies) in each timer interrupt
  • Reprogram the timer to slow the interrupt down
    • No OS modifications
    • No HW changes
  • Speed up the processor by x5

10 ms

Timer interrupt

before time dilation

50 ms in wall clock time

Timer interrupt after time dilation

10 ms perceived in target

experiments
Experiments
  • HW Emulator (FPGA): 32-bit Leon3 with, 50MHz, 90 MHz DDR memory, 8K L1 Cache (4K Inst and 4K Data)
    • Target system: Linux 2.6 kernel, Leon @

50 MHz / 250 MHz / 500 MHz / 1 GHz / 2 GHz

    • Run Dhrystone benchmark
    • Tomorrow: HW/SW co-simulation example
  • Concept

Time Dilation Factor = wall clock time / emulated clock time

dhrystone result w o memory td
Dhrystone result (w/o memory TD)

How close to a 3 GHz x86 ~8000 Dhrystone MIPS? Memory, Cache, CPI

problems
Problems
  • Similar to time dilation in VM
    • To Infinity and Beyond: Time-Warped Network Emulation, NSDI 06
  • Everything scaled linearly, including memory!
    • VM is lucky: networking code can fit in cache easily.
    • RAMP has more knobs to tweak.
  • Solution: slow down the memory and redo the experiment
dhrystone w memory td
Dhrystone w. Memory TD

Keep the memory access latency constant -90 MHz DDR DRAM w. 200 ns latency in all target (50MHz to 2GHz)- Latency is pessimistic, but reflect the trendRAMP blue result + Time dilation vs. real system?

limitation of na ve time dilation
Limitation of Naïve time dilation

Unit

Time dilation counter

  • Fixed CPI (memory/CPU) model
  • Next step
    • Variable time dilation factor: distribution and state (statistic model)
    • Emulate OOO with time dilation

Peek each instruction and dilate it

    • Going to deterministic? No, I’ll do statistic
  • No extra control between units
  • Reprogram Time Dilation Counter (TDC) in each unit to get different target configuration

Proposed model