Download
razor dynamic voltage scaling based on circuit level timing speculation n.
Skip this Video
Loading SlideShow in 5 Seconds..
Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation PowerPoint Presentation
Download Presentation
Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation

Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation

177 Views Download Presentation
Download Presentation

Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation Advanced Computer Architecture Laboratory The University of Michigan Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, and Conrad Ziesler Faculty Members: David Blaauw, Todd Austin, and Trevor Mudge Krisztián Flautner, ARM Ltd. December 3rd, 2003

  2. Intra-die variations in ILD thickness Dynamic Voltage Scaling and Design Uncertainty • DVS - Adapting voltage/frequency to meet performance demands of workload • Lower processor voltage during periods of low utilization • Lower Voltage is a Good Thing™ for power • Minimum voltage is limited by Safety Margins • Error-free operation must be guaranteed! • Technology trends are Maximizing the Minimums • Process and temperature variation • Capacitive and inductive noise • Key Observation: worst-case conditions also highly improbable • Significant gain for circuits optimized for common case • Efficient mechanisms needed to tolerate infrequent worst-case scenarios

  3. Traditional DVS Zero margin Sub-critical Shaving Voltage Margins with Razor • Goal: reduce voltage margins with in-situ error detection and correction for delay failures • Proposed Approach: • Remove safety margins and tolerate occasional errors • Tune processor voltage based on error rate • Purposely run below critical voltage • Data-dependent latency margins • Trade-off: voltage power savings vs. overhead of correction • Analogous to wireless power modulation

  4. Main FF Shadow Latch Main FF Razor Timing Error Detection • Second sample of logic value used to validate earlier sample • Key design issues: • Maintaining pipeline forward progress - Meta-stable results in main flip-flop • Short path impact on shadow-latch - Recovering pipeline state after errors • Power overhead of error detection and correction 5 9 3 9 MEM 4 9 clk clk clk_del

  5. Main FF Shadow Latch Main FF Hold Constraint (~1/2 cycle) Razor Short Path Constraint • Second sample of logic value used to validate earlier sample • Key design issues: • Maintaining pipeline forward progress - Meta-stable results in main flip-flop • Short path impact on shadow-latch - Recovering pipeline state after errors • Power overhead of error detection and correction 3 5 9 9 8 MEM 2 4 8 clk clk clk_del

  6. Razor FF Razor FF PC Centralized Razor Pipeline Error Recovery Cycle: 2 0 3 6 5 1 4 inst2 inst1 inst6 inst4 inst5 inst3 IF ID EX MEM WB (reg/mem) Razor FF Razor FF error error error error recover recover recover recover clock • Once cycle penalty for timing failure • Global synchronization may be difficult for fast, complex designs

  7. Stabilizer FF Razor FF Razor FF Razor FF Razor FF PC Distributed Razor Pipeline Error Recovery Cycle: 3 2 5 1 0 7 8 9 6 4 inst3 inst4 inst7 inst1 inst8 inst3 inst4 inst2 inst5 inst6 inst2 IF ID EX MEM (read-only) WB (reg/mem) error bubble error bubble error bubble error bubble recover recover recover recover Flush Control flushID flushID flushID flushID • Multiple cycle penalty for timing failure • Scalable design since all recovery communication is local • Builds on existing branch / data speculation recovery framework

  8. Error-Rate Studies – Hardware Measurement

  9. 35% energy savings with 1.3% error 22% saving once every 20 seconds! Error Rate Studies – Empirical Results

  10. Error Rate Studies – SPICE-Level Simulations • Based on a SPICE-level simulations of a Kogge-Stone adder 200 mV

  11. 3 mm I-Cache Register File WB 3.3 mm IF ID EX MEM D-Cache Razor I - Prototype Razor Implementation • 4 stage 64-bit Alpha pipeline: • 200MHz expected operation in 0.18mmtechnology, 1.8V, ~500mW • Tunable via software from50-200MHz, 1.1-1.8V • Razor applied to combinational logic • Razor overhead: • Total of 192 Razor flip-flops out of 2408 total (9%) • Error-free power overhead: ~ 3%

  12. Pipeline Throughput Energy IPC Total Energy, Etotal = Eproc + Erecovery Optimal Etotal Energy of Processor Operations, Eproc Energy of Pipeline Recovery, Erecovery Energy of Processor w/o Razor Support Decreasing Supply Voltage Effects of Razor DVS

  13. EX-Stage Analysis – Optimal Voltage Sweep Recovery cost includes energy to recover entire pipeline (18x an add)

  14. EX-Stage Analysis – Optimal Voltage Sweep

  15. Simulation Analysis – Energy-Optimal Voltage

  16. Simulation Analysis – Razor DVS Execution

  17. Simulation Analysis – Razor DVS Performance

  18. clk D1 Q1 0 Main Flip-Flop PC Razor FF Stabilizer FF Razor FF Razor FF Razor FF 1 Error_L Shadow Latch comparator IF ID EX MEM (read-only) WB (reg/mem) Error RAZOR FF clk_del bubble bubble bubble error error bubble error error recover recover recover recover Flush Control flushID flushID flushID flushID Conclusions • In-situ detection/correction of timing errors • Eliminate process, temperature, and safety margins • Tune processor voltage based on error rate • Purposely run below critical voltage to capture data-dependent latency margins • Implemented with architecture/circuit support • Double-sampling metastability-tolerantRazor flip-flops validate logic results • Pipeline initiates recovery after circuit timing errors, no voltage/clock re-tuning needed • Trade-off: supply voltage power savingsvs. overhead of correction • Running with error is good!

  19. Future Directions • Research opportunities • Razor for caches/memory and control logic • Voltage control algorithms, especially per-stage tuning • Typical-case energy optimized designs (instead of worse-case latency optimized) • Turnkey application of Razor technology • Prototype design, fabrication, evaluation • Razor I – Q4 2003 – Razor-ized combinational logic, global tuning • Razor II – Q3 2004 – Razor-ized caches and control logic, per-stage tuning • Other applications • Single-event upset (SEU) protection using Razor error detection/re-execution • Over-clocking for performance improvement (large gains among hobbyists)

  20. Questions ? ? ? ? ? ? ? ? ? ? ? ?

  21. Back-up Slides

  22. Mem C ontrol Data cache I O U N I T Floating point and graphics Ex Unit Control Unit Cache control control L2 tags L2 Cache L2 Cache Other Approaches to Dynamic Voltage Scaling • Traditional DVS • Valid voltage / delay combinations “blessed” at design time • Approach leaves a significant amount of energy “on the table” • Temperature, process, data, and safety margins placed on voltage • Other approaches miss some margins • Slack detector – automatic tuning • ARM’s Intelligent Energy Manager (IEM) • Processor voltage automatically tuned toexternal ambient conditions • Inverter chain designed to track mostrestrictive critical path, margin still required

  23. Logic Stage L1 Logic Stage L2 0 1 Error_L Shadow Latch clk_del Razor Flip-Flop Implementation • Compare latched data with shadow-latch on delayed clock • Upon failure: place data from shadow-latch in main latch • Ensure shadow latch always correct using conservative design techniques • Correct value in shadow latch guarantees forward progress • Recover pipeline using microarchitectural recovery mechanism clk Q D Main Flip-Flop comparator Error RAZOR FF

  24. clk_b clk D Q clk_b clk Meta-stability detector Inv_n Error_L Inv_p clk_del_b clk_del Shadow Latch Razor Flip-Flop Circuit Error_L

  25. clock intended path short path Min. Path Delay > tdelay + thold clock_del tdelay thold Min. path delay Overcoming Short Path Constraints • Delayed clock imposes a short-path constraint • Razor necessary only for latches on slow paths • Pad fast path for latches with mixed path delays • Trade-off between DVS headroom and short path constraints ff Pad with extra delay Razor_ff Long Paths Short Paths clock

  26. X X X Hardware Measurement Setup Slow Pipeline A 36 18 18x18 48-bit LFSR != 40-bit Error Counter clk/2 clk/2 Slow Pipeline B 36 clk/2 18x18 48-bit LFSR clk/2 clk/2 18 Fast Pipeline 36 stabilize 18x18 clk clk clk

  27. Simulation Methodology • Challenge: instruction latency depends on circuit evaluation latency • May vary with changes in stage inputs, stage logic, voltage, temperature… • Dynamic timing simulation combines architectural/circuit simulation • Initial implementation utilized a hand-generated EX-stage circuit model • Effort ongoing to automate extraction/decomposition/integration into SimpleScalar

  28. reset Ediff = Eref - Esample Pipeline Voltage Control Function Voltage Regulator Esample Vdd Ediff . . . error signals Eref -  Supply Voltage Control System • Current design utilizes a very simple proportional control function • Control algorithm implemented in software

  29. Error Pipeline Recovery IF ID EX MEM MEM WB inst inst inst inst inst inst clk clk_d ID.d EX.d Redo instruction in MEM MEM.d No Error error

  30. Utilization Time Voltage Scaling under Dynamic Workloads • Adapt frequency/voltage to performance demands of workload • Software controlled processor speed • Lower processor voltage during periods of low operating frequency Vdd Freq Voltage • Quadratic reduction in dynamic power and energy • Super-quadratic reduction in leakage

  31. High-level HDL Specification WB IF ID EX MEM PC FF FF FF FF Circuit Extractionwith Parasitics Variable Voltage SDF generation Architecture Specification Power/Delay C-model SimpleScalar + DTA Voltage Control Algorithm Detailed Power/Delay Analysis Simulation Flow • Automatic creation of very detailed power/delay C-models

  32. Simulation Methodology • Dynamic timing simulation combines architectural/circuit simulation • Contrast to static timing simulation which is only concerned with critical path • SimpleScalar/Alpha architectural-level simulation • Gate-level simulation of per-stage logic blocks • Logic block model describes cells, local and global interconnect • Cells characterized with SPICE at varied slew/cap-load/voltage • Each cycle, circuit simulator evaluates delay of each stages’ logic block\ 0 1 0 1 1 0 1 0 1 1 1

  33. Simulation Analysis – Razor DVS Execution

  34. Razor Demo

  35. pos neg pos error fail Dynamic Or / Latch restore restore bubble bubble flush flush neg More Details on Meta-Stability • Sub-critical operation invites meta-stability • Meta-stability detector itself can become meta-stable • double latch error signal to obtain sufficient small probability clk_b clk D Q clk_b clk restore clk_del_b clk_del • Flush entire pipe • No forward progress • Reduce frequency

  36. I1 I2 I1 I2 Short Path Short Path Failure IF ID EX MEM WB inst1 inst2 inst2 inst1 inst1 clk clk_d ID.d EX.d MEM.d error