A Unified Model for Timing Speculation:
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Marc de Kruijf Shuou Nomura Karu Sankaralingam PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on
  • Presentation posted in: General

A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism. Marc de Kruijf Shuou Nomura Karu Sankaralingam. From Hard to Harder. 10000nm. 720nm. 4000um. 360nm. 1500um. 180nm. 90nm. 45nm & beyond. Hard.

Download Presentation

Marc de Kruijf Shuou Nomura Karu Sankaralingam

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Marc de kruijf shuou nomura karu sankaralingam

A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design Style, and Fault Recovery Mechanism

Marc de Kruijf

Shuou Nomura

KaruSankaralingam


From hard to harder

From Hard to Harder

10000nm

720nm

4000um

360nm

1500um

180nm

90nm

45nm & beyond

Hard

Harder


What is the problem

What is the Problem?

  • Non-ideal transistor scaling

    • Transistor wear-out

    • Process, voltage, and temperature (PVT) variations

    • Errors due to particle interference

    • Noise coupling & crosstalk


What is the problem1

What is the Problem?

Dynamic

verification

Multi-core

Coherence &

consistency

Timing speculation

DMR

Need high-level analysis tools

ECC

On-chip network

TMR

RMT

Watchdog

Branch

prediction

Out-of-order

HW checkpoints

Performance Toolbox

Reliability Toolbox


Our contribution

Our Contribution

  • A model for timing speculation

    • Unifies hardware + system

    • Small set of high-level inputs

processor

designer

Also….

Q.What is the impact of technology scaling?

A.Further benefits are small to none.

Q. What is the impact of CMOS design style?

A.Very low power designs benefit most.

Q.What is the impact of the fault recovery mechanism?

A.Fine-grained recovery is key to high efficiencies.


Outline

Outline

  • Timing Speculation

  • Model Overview

    • Hardware Efficiency Model

    • System Recovery Model

  • Results

  • Conclusion


Timing speculation

Timing Speculation

clock

clock period

( = 1/frequency )

detect &

recover

circuit delay

variations

Timing failure!

slower clock

OK!


Outline1

Outline

  • Timing Speculation

  • Model Overview

    • Hardware Efficiency Model

    • System Recovery Model

  • Results

  • Conclusion


Model overview

Model Overview

Hardware Efficiency

System Recovery

Overall Efficiency

Energy

Energy

Time

Error rate

Error rate

Error rate

Model Inputs

1. A hardware path delay distribution

2. Effect of variations on path delay as N(μ,σ)

3. The time between recovery checkpoints

4.The time to restore a checkpoint


Hardware efficiency model

Hardware Efficiency Model

Input 1: Path delay distribution

Input 2: Path delay variation (σ)

# Paths

Path delay

Error prob.

Error prob.

Clock period

Clock period

Energy

Error rate

Clock period

Energy

e.g.

frequency

scaling

Error prob.

Error prob.


System recovery model

System Recovery Model

(applies to all backward error recovery systems)

( )

overhead(rate) =

failures(rate) x

waste(rate)

+ restore

System Recovery Model Inputs

1. The time between recovery checkpoints (cycles)

2.The time to restore a checkpoint (restore)

Time

Error rate


Outline2

Outline

  • Timing Speculation

  • Model Overview

    • Hardware Efficiency Model

    • System Recovery Model

  • Results

  • Conclusion


Results

Results

Is the model useful?

What can we learn?

Technology

Node

Recovery

System

CMOS Design Style

High Performance CMOS

Razor

11nm

45nm

Low Power CMOS

Reunion

Ultra-low Power CMOS

Paceline


Results1

Results

Hardware Efficiency

System Recovery

Overall Efficiency

Energy

Energy

Time

Error rate

Error rate

Error rate


Hardware model inputs

Hardware Model Inputs

  • Path delay distribution

    • Application: H.264 decoding

    • Hardware: OpenRISC processor

  • Effect of process variations as N(μ,σ) using ITRS data

    • High Performance CMOS

      • 45nm σ = 0.046μ

      • 11nm σ = 0.051μ

    • Low Power CMOS

      • 45nm σ = 0.029μ

      • 11nm σ = 0.042μ

    • Ultra-low Power CMOS

      • 45nm σ = 0.196μ


Hardware efficiency

Hardware Efficiency

Energy = Power x Time

Energy

EDP

EDP = Power x Time2

Error rate

Normalized

EDP

Results for

High Performance CMOS

Error rate


Recovery model inputs

Recovery Model Inputs

  • The time between recovery checkpoints &

  • The time to restore a checkpoint

    • Razor

      • Latch-level detection + pipeline rollback

      • 1 cycle checkpoint size & 5 cycle recovery cost

    • Reunion

      • DMR detection + checkpoint

      • 100 cycle checkpoint size & 100 cycle recovery cost

    • Paceline

      • DMR detection + checkpoint + flush

      • 100 cycle checkpoint size & 1000 cycle recovery cost


System recovery

System Recovery

Time

Error rate

Normalized

Time

Error rate


Overall efficiency

Overall Efficiency

1. High Performance CMOS

2. Low Power CMOS

3. Ultra-low Power CMOS

EDP

Error rate


Overall efficiency1

Overall Efficiency

High Performance CMOS

Normalized

EDP

23% PEAK, 8-15% Typical

Error rate


Overall efficiency2

Overall Efficiency

Low Power CMOS

Normalized

EDP

18% Peak, 5-10% Typical

Error rate


Overall efficiency3

Overall Efficiency

Ultra-low Power CMOS

Normalized

EDP

47% Peak, 20-30% Typical

Error rate


Outline3

Outline

  • Timing Speculation

  • Model Overview

    • Hardware Efficiency Model

    • System Recovery Model

  • Results

  • Conclusion


Conclusions

Conclusions

  • A High-level Model

  • Results

    • Efficiency gains improve only minimally with scaling

    • Ultra-low power (sub-threshold) CMOS benefits most

    • Fine-grained recovery is key

  • Future Work

    • Incorporate more sources of variation

    • A tool for processor designers?

      • Under development at http://www.cs.wisc.edu/vertical


Questions

Questions?

Multi-core

Coherence &

consistency

On-chip network

Timing speculation

Branch

prediction

Out-of-order


Marc de kruijf shuou nomura karu sankaralingam

?

DSN 2010 - ‹#›


Timing speculation1

Timing Speculation

Source of Timing Variation

Manufacturing

Process

Runtime

Application

Speed Binning

Online Timing Analysis

Timing Speculation

Figure adapted from Greskamp et al., Paceline: [...]. In PACT ’07.


System recovery model1

System Recovery Model

System Recovery Model Inputs

1. The time between recovery checkpoints (cycles)

2.The time to restore a checkpoint (restore)

expected # failures before success

expected # cycles executed upon failure


Overall inputs

Overall Inputs

  • Path delay distribution

    • Application: H.264 decoding

    • Hardware: OpenRISC processor

  • Effect of process variations on path delay as N(μ,σ) using ITRS data

    • High Performance [email protected] = 0.046μ

    • Low Power [email protected] = 0.029μ

    • Ultra-low Power CMOS @45nmσ = 0.196μ

  • The time between recovery checkpoints &

  • The time to restore a checkpoint

    • Razor – Latch-level detection + pipeline rollback(1 & 5 cycles)

    • Reunion – DMR detection + checkpoint(100 & 100 cycles)

    • Paceline – DMR detection + checkpoint + flush(100 & 1000 cycles)


  • Login