A Unified Model for Timing Speculation:
Download
1 / 29

Marc de Kruijf Shuou Nomura Karu Sankaralingam - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism. Marc de Kruijf Shuou Nomura Karu Sankaralingam. From Hard to Harder. 10000nm. 720nm. 4000um. 360nm. 1500um. 180nm. 90nm. 45nm & beyond. Hard.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Marc de Kruijf Shuou Nomura Karu Sankaralingam' - zanna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism

Marc de Kruijf

Shuou Nomura

KaruSankaralingam


From hard to harder
From Hard to Harder

10000nm

720nm

4000um

360nm

1500um

180nm

90nm

45nm & beyond

Hard

Harder


What is the problem
What is the Problem?

  • Non-ideal transistor scaling

    • Transistor wear-out

    • Process, voltage, and temperature (PVT) variations

    • Errors due to particle interference

    • Noise coupling & crosstalk


What is the problem1
What is the Problem?

Dynamic

verification

Multi-core

Coherence &

consistency

Timing speculation

DMR

Need high-level analysis tools

ECC

On-chip network

TMR

RMT

Watchdog

Branch

prediction

Out-of-order

HW checkpoints

Performance Toolbox

Reliability Toolbox


Our contribution
Our Contribution

  • A model for timing speculation

    • Unifies hardware + system

    • Small set of high-level inputs

processor

designer

Also….

Q. What is the impact of technology scaling?

A. Further benefits are small to none.

Q. What is the impact of CMOS design style?

A. Very low power designs benefit most.

Q. What is the impact of the fault recovery mechanism?

A. Fine-grained recovery is key to high efficiencies.


Outline
Outline

  • Timing Speculation

  • Model Overview

    • Hardware Efficiency Model

    • System Recovery Model

  • Results

  • Conclusion


Timing speculation
Timing Speculation

clock

clock period

( = 1/frequency )

detect &

recover

circuit delay

variations

Timing failure!

slower clock

OK!


Outline1
Outline

  • Timing Speculation

  • Model Overview

    • Hardware Efficiency Model

    • System Recovery Model

  • Results

  • Conclusion


Model overview
Model Overview

Hardware Efficiency

System Recovery

Overall Efficiency

Energy

Energy

Time

Error rate

Error rate

Error rate

Model Inputs

1. A hardware path delay distribution

2. Effect of variations on path delay as N(μ,σ)

3. The time between recovery checkpoints

4. The time to restore a checkpoint


Hardware efficiency model
Hardware Efficiency Model

Input 1: Path delay distribution

Input 2: Path delay variation (σ)

# Paths

Path delay

Error prob.

Error prob.

Clock period

Clock period

Energy

Error rate

Clock period

Energy

e.g.

frequency

scaling

Error prob.

Error prob.


System recovery model
System Recovery Model

(applies to all backward error recovery systems)

( )

overhead(rate) =

failures(rate) x

waste(rate)

+ restore

System Recovery Model Inputs

1. The time between recovery checkpoints (cycles)

2. The time to restore a checkpoint (restore)

Time

Error rate


Outline2
Outline

  • Timing Speculation

  • Model Overview

    • Hardware Efficiency Model

    • System Recovery Model

  • Results

  • Conclusion


Results
Results

Is the model useful?

What can we learn?

Technology

Node

Recovery

System

CMOS Design Style

High Performance CMOS

Razor

11nm

45nm

Low Power CMOS

Reunion

Ultra-low Power CMOS

Paceline


Results1
Results

Hardware Efficiency

System Recovery

Overall Efficiency

Energy

Energy

Time

Error rate

Error rate

Error rate


Hardware model inputs
Hardware Model Inputs

  • Path delay distribution

    • Application: H.264 decoding

    • Hardware: OpenRISC processor

  • Effect of process variations as N(μ,σ) using ITRS data

    • High Performance CMOS

      • 45nm σ = 0.046μ

      • 11nm σ = 0.051μ

    • Low Power CMOS

      • 45nm σ = 0.029μ

      • 11nm σ = 0.042μ

    • Ultra-low Power CMOS

      • 45nm σ = 0.196μ


Hardware efficiency
Hardware Efficiency

Energy = Power x Time

Energy

EDP

EDP = Power x Time2

Error rate

Normalized

EDP

Results for

High Performance CMOS

Error rate


Recovery model inputs
Recovery Model Inputs

  • The time between recovery checkpoints &

  • The time to restore a checkpoint

    • Razor

      • Latch-level detection + pipeline rollback

      • 1 cycle checkpoint size & 5 cycle recovery cost

    • Reunion

      • DMR detection + checkpoint

      • 100 cycle checkpoint size & 100 cycle recovery cost

    • Paceline

      • DMR detection + checkpoint + flush

      • 100 cycle checkpoint size & 1000 cycle recovery cost


System recovery
System Recovery

Time

Error rate

Normalized

Time

Error rate


Overall efficiency
Overall Efficiency

1. High Performance CMOS

2. Low Power CMOS

3. Ultra-low Power CMOS

EDP

Error rate


Overall efficiency1
Overall Efficiency

High Performance CMOS

Normalized

EDP

23% PEAK, 8-15% Typical

Error rate


Overall efficiency2
Overall Efficiency

Low Power CMOS

Normalized

EDP

18% Peak, 5-10% Typical

Error rate


Overall efficiency3
Overall Efficiency

Ultra-low Power CMOS

Normalized

EDP

47% Peak, 20-30% Typical

Error rate


Outline3
Outline

  • Timing Speculation

  • Model Overview

    • Hardware Efficiency Model

    • System Recovery Model

  • Results

  • Conclusion


Conclusions
Conclusions

  • A High-level Model

  • Results

    • Efficiency gains improve only minimally with scaling

    • Ultra-low power (sub-threshold) CMOS benefits most

    • Fine-grained recovery is key

  • Future Work

    • Incorporate more sources of variation

    • A tool for processor designers?

      • Under development at http://www.cs.wisc.edu/vertical


Questions
Questions?

Multi-core

Coherence &

consistency

On-chip network

Timing speculation

Branch

prediction

Out-of-order


?

DSN 2010 - ‹#›


Timing speculation1
Timing Speculation

Source of Timing Variation

Manufacturing

Process

Runtime

Application

Speed Binning

Online Timing Analysis

Timing Speculation

Figure adapted from Greskamp et al., Paceline: [...]. In PACT ’07.


System recovery model1
System Recovery Model

System Recovery Model Inputs

1. The time between recovery checkpoints (cycles)

2. The time to restore a checkpoint (restore)

expected # failures before success

expected # cycles executed upon failure


Overall inputs
Overall Inputs

  • Path delay distribution

    • Application: H.264 decoding

    • Hardware: OpenRISC processor

  • Effect of process variations on path delay as N(μ,σ) using ITRS data

    • High Performance CMOS @45nm σ = 0.046μ

    • Low Power CMOS @45nm σ = 0.029μ

    • Ultra-low Power CMOS @45nm σ = 0.196μ

  • The time between recovery checkpoints &

  • The time to restore a checkpoint

    • Razor – Latch-level detection + pipeline rollback (1 & 5 cycles)

    • Reunion – DMR detection + checkpoint (100 & 100 cycles)

    • Paceline – DMR detection + checkpoint + flush (100 & 1000 cycles)


ad