Runnemede disruptive technologies for uhpc l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Runnemede : Disruptive Technologies for UHPC PowerPoint PPT Presentation


  • 270 Views
  • Uploaded on
  • Presentation posted in: General

Runnemede : Disruptive Technologies for UHPC. John Gustafson Intel Labs HPC User Forum – Houston 2011. The battle lines are drawn…. “We’re going to try to make the entire exascale machine cache-coherent .” —Bill Dally, Nvidia. “Caches are for morons.” —Shekhar Borkar, Intel.

Download Presentation

Runnemede : Disruptive Technologies for UHPC

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Runnemede disruptive technologies for uhpc l.jpg

Runnemede:Disruptive Technologiesfor UHPC

John Gustafson

Intel Labs

HPC User Forum – Houston 2011


The battle lines are drawn l.jpg

The battle lines are drawn…

“We’re going to try to make the entire exascale machine cache-coherent.”

—Bill Dally, Nvidia

“Caches are for morons.”

—Shekhar Borkar, Intel


Intel s uhpc approach l.jpg

Intel’s UHPC Approach

  • Design test chips with the idea of maximizing learning.

  • Very different from producing product roadmap processor designs.

  • Going from Peta to Exa is nothing like the last few 1000x increases…


Building with today s technology l.jpg

Building with Today’s Technology

TFLOP Machine today

Decode and control

Translations

…etc

Power supply losses

Cooling…etc

4450W

10TB disk @ 1TB/disk @10W

5KW

100W

Disk

100pJ com per FLOP

100W

Com

0.1B/FLOP @ 1.5nJ per Byte

150W

Memory

200W

200pJ per FLOP

Compute

KW Tera, MW Peta, GW Exa?


The power energy challenge l.jpg

The Power & Energy Challenge

TFLOP Machine today

4550W

TFLOP Machine then

With Exa Technology

5KW

100W

Disk

100W

Com

5W

~3W

150W

~20W

Memory

~5W

2W

200W

Compute

5W


Scaling assumptions l.jpg

Scaling Assumptions

65 nm Core + Local Memory

8 nm Core + Local Memory

DP FP Add, Multiply

Integer Core, RF

Router

5mm2 (50%)

DP FP Add, Multiply

Integer Core, RF

Router

0.17mm2 (50%)

Memory 0.35MB

0.17mm2 (50%)

Memory 0.35MB

5mm2 (50%)

~0.6mm

0.34 mm2, 4.6 GHz, 9.2 GF, 0.24 to 0.46 W

10 mm2, 3 GHz, 6 GF, 1.8 W


Near threshold logic l.jpg

1

450

10

65nm CMOS, 50°C

65nm CMOS, 50°C

400

350

300

1

250

Energy Efficiency (GOPS/Watt)

Active Leakage Power (mW)

200

9.6X

Subthreshold Region

-1

150

10

100

50

320mV

320mV

-2

0

10

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Supply Voltage (V)

Near Threshold Logic

H. Kaul et al, 16.6: ISSCC08


Revise dram architecture l.jpg

Traditional DRAM

New DRAM architecture

RAS

Addr

Page

Page

Page

Page

Page

Page

CAS

Addr

Activates many pages

Lots of reads and writes (refresh)

Small amount of read data is used

Requires small number of pins

Activates few pages

Read and write (refresh) what is needed

All read data is used

Requires large number of I/Os(3D)

Revise DRAM Architecture

Energy cost today:

~150 pJ/bit


Data locality l.jpg

Data Locality

Chip to memory

Communication:

~1.5 nJ per Byte

~150 pJper Byte

Core-to-core

Communication on the chip:

~10 pJper Byte

Chip to chip

Communication:

~100 pJper Byte

Data movement is expensive—keep it local

(1) Core to core, (2) Chip-to-chip, (3) Memory


Disruptive approach to faults l.jpg

Disruptive Approach to Faults

  • We tend to assume that execution faults (soft errors, hard errors) are rare. And it’s a valid speculation. Currently.

  • Soon, we will need much more paranoia in hardware designs.


Road to unreliability l.jpg

Road to Unreliability?

Resiliency will be the cornerstone


Resiliency l.jpg

Resiliency

Minimal overhead for resiliency

Error detection

Fault isolation

Fault confinement

Reconfiguration

Recovery & Adapt

Applications

System Software

Programming system

Microcode, Platform

Microarchitecture

Circuit & Design


Execution model and codelets l.jpg

Execution Model and Codelets

Sea of Codelets

Programming Models/Systems (Rich)

  • Codelet - Code that can be executed non-preemptively with an “event-driven” model

  • Shared memory model based on LC (Location Consistency – a generalized single-assignment model [GaoSarkar1980])

Run Time System

Cores

Hardware Abstraction

Advanced Hardware Monitoring

Net

Peripherals/Devices


Summary l.jpg

Summary

  • Voltage scaling to reduce power and energy

    • Explodes parallelism

    • Cost of communication vs computation—critical balance

    • Resiliency to combat side-effects and unreliability

  • Programming system for extreme parallelism

  • Application driven, HW/SW co-design approach

  • Self-awareness & execution model to harmonize


  • Login