exponential challenges exponential rewards the future of moore s law l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Exponential Challenges, Exponential Rewards— The Future of Moore’s Law PowerPoint Presentation
Download Presentation
Exponential Challenges, Exponential Rewards— The Future of Moore’s Law

Loading in 2 Seconds...

play fullscreen
1 / 49

Exponential Challenges, Exponential Rewards— The Future of Moore’s Law - PowerPoint PPT Presentation


  • 174 Views
  • Uploaded on

Exponential Challenges, Exponential Rewards— The Future of Moore’s Law. Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004. ISSCC 2003— Gordon Moore said…. “No exponential is forever… But We can delay Forever”. Outline. The exponential challenges

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Exponential Challenges, Exponential Rewards— The Future of Moore’s Law' - bibiane


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
exponential challenges exponential rewards the future of moore s law

Exponential Challenges, Exponential Rewards—The Future of Moore’s Law

Shekhar Borkar

Intel Fellow

Circuit Research, Intel Labs

Fall, 2004

isscc 2003 gordon moore said
ISSCC 2003—Gordon Moore said…

“No exponential is forever…

But

We can delay Forever”

outline
Outline
  • The exponential challenges
  • Circuit and mArch solutions
  • Major paradigm shifts in design
  • Integration & SOC
  • The exponential reward
  • Summary
goal 1tips by 2010

How do you get there?

Goal: 1TIPS by 2010

Pentium® 4 Architecture

Pentium® Pro Architecture

Pentium® Architecture

486

386

286

8086

technology scaling

GATE

DRAIN

SOURCE

BODY

Technology Scaling

GATE

Xj

DRAIN

SOURCE

D

Tox

BODY

Leff

Technology has scaled well, will it in the future?

is transistor a good switch

I ≠0

I = 0

On

I = 1ma/u

I = ∞

I = 0

I ≠0

Off

I = 0

I ≠ 0

Sub-threshold Leakage

Is Transistor a Good Switch?
exponential challenge 1
Exponential Challenge #1

Subthreshold

Leakage

sub threshold leakage
Sub-threshold Leakage

Assume:

0.25mm, Ioff = 1na/m

5X increase each generation at 30ºC

Sub-threshold leakage increases exponentially

sd leakage power
SD Leakage Power

SD leakage power becomes prohibitive

leakage power
Leakage Power

A. Grove, IEDM 2002

Leakage power limits Vt scaling

gate oxide is near limit

90nm MOS Transistor

50nm

Gate

1.2 nm SiO2

Silicon substrate

Gate Oxide is Near Limit

If Tox scaling slows down, then Vdd scaling will have to slow down

High-K dielectric is crucial

energy per logic operation
Energy per Logic Operation

Energy per logic operation scaling will slow down

the power crisis
The Power Crisis

Business as usual is not an option

sources of variations

Lithography

Wavelength

365nm

248nm

193nm

180nm

130nm

Gap

90nm

65nm

45nm

Generation

32nm

13nm EUV

Random Dopant Fluctuations

Source: Mark Bohr, Intel

Sub-wavelength Lithography

Heat Flux (W/cm2)

Results in Vcc variation

Temperature Variation (°C)

Hot spots

Sources of Variations
frequency sd leakage

Low Freq

Low Isb

High Freq

High Isb

High Freq

Medium Isb

Frequency & SD Leakage

0.18 micron

~1000 samples

30%

20X

vt distribution

High Freq

Medium Isb

Low Freq

Low Isb

High Freq

High Isb

Vt Distribution

0.18 micron

~1000 samples

~30mV

platform requirements

3000

2500

2000

System Volume ( cubic inch)

1500

1000

500

0

PC tower

Mini tower

m-tower

Slim line

Small pc

1.5

100

Pentium ® III

75

Projected Heat Dissipation Volume

1.0

Pentium ® 4

Heat-Sink Volume (in3)

Thermal Budget (oC/W)

50

Air Flow Rate (CFM)

Projected Air Flow Rate

0.5

25

Thermal Budget

0

0

250

0

50

100

150

200

Power (W)

Platform Requirements

Shrinking volume

Quieter

Yet, High Performance

Thermal budget

decreasing

Higher heat sink volume

Higher air flow rate

exponential costs

Litho Cost

FAB Cost

G. Moore

ISSCC 03

www.icknowledge.com

$ per Transistor

$ per MIPS

Exponential Costs
product cost pressure
Product Cost Pressure

Shrinking ASP, and shrinking $ budget for power

must fit in power envelope
Must Fit in Power Envelope

)

1400

2

SiO2 Lkg

10 mm Die

1200

SD Lkg

Active

1000

800

Power (W), Power Density (W/cm

8 MB

600

4 MB

400

2 MB

1 MB

200

0

90nm

65nm

45nm

32nm

22nm

16nm

Technology, Circuits, and Architecture to constrain the power

some implications
Tox scaling will slow down—may stop?

Vdd scaling will slow down—may stop?

Vt scaling will slow down—may stop?

Approaching constant Vdd scaling

Energy/logic op will not scale

Some Implications
the gigascale dilemma
The Gigascale Dilemma
  • 1B T integration capacity will be available
  • But could be unusable due to power
  • Logic T growth will slow down
  • Transistor performance will be limited

Solutions

  • Low power design techniques
  • Improve design efficiency—Multi everywhere
  • Valued performance by even higher integration (of potentially slower transistors)
slide29

Solutions

Power—active and leakage

Variations

Microarchitecture

active power reduction

Slow

Fast

Slow

High Supply

Voltage

Multiple Supply Voltages

Low Supply

Voltage

Replicated Designs

Vdd

Vdd/2

Freq = 0.5

Vdd = 0.5

Throughput = 1

Power = 0.25

Area = 2

Pwr Den = 0.125

Freq = 1

Vdd = 1

Throughput = 1

Power = 1

Area = 1

Pwr Den = 1

Logic Block

Logic Block

Logic Block

Active Power Reduction
leakage control

Body Bias

Stack Effect

Sleep Transistor

Vbp

Vdd

+Ve

Logic Block

Equal Loading

Vbn

-Ve

5-10X

Reduction

2-1000X

Reduction

2-10X

Reduction

Leakage Control
circuit design tradeoffs

power

2

2

target frequency probability

1.5

1.5

1

1

0.5

0.5

0

0

large

high

small

low

Transistor size

Low-Vt usage

Circuit Design Tradeoffs
  • Higher probability of target frequency with:
    • Larger transistor sizes
    • Higher Low-Vt usage
  • But with power penalty
impact of critical paths

60%

# critical paths

1.4

40%

1.3

Number of dies

20%

Mean clock frequency

1.2

0%

0.9

1.1

1.3

1.5

1.1

Clock frequency

1

9

17

25

# of critical paths

Impact of Critical Paths
  • With increasing # of critical paths
    • Both s and m become smaller
    • Lower mean frequency
impact of logic depth

Logic depth: 16

1.0

to Ion-s

NMOS Ion

PMOS Ion

Delay

0.5

Ratio of

s/m

s

m

s

m

/

/

delay-s

5.6%

3.0%

4.2%

0.0

16

49

Logic depth

Impact of Logic Depth

Device I

NMOS

40%

ON

PMOS

20%

Number of samples (%)

Delay

40%

20%

0%

-16%

-8%

0%

8%

16%

Variation (%)

m architecture tradeoffs

1.5

1.5

1

1

0.5

0.5

0

0

less

more

large

small

# uArch critical paths

Logic depth

frequency

target frequency probability

mArchitecture Tradeoffs
  • Higher target frequency with:
    • Shallow logic depth
    • Larger number of critical paths
  • But with lower probability
variation tolerant design

2

2

power

1.5

1.5

target frequency probability

1

1

0.5

0.5

0

0

large

high

small

low

Transistor size

Low-Vt usage

1.5

1.5

1

1

0.5

0.5

frequency

0

target frequency probability

0

less

more

large

small

# uArch critical paths

Logic depth

Variation-tolerant Design

Balance power & frequency with variation tolerance

probabilistic design

Due to variations in:

Vdd, Vt, and Temp

Path Delay

Deterministic

Deterministic

Probabilistic

Frequency

Probabilistic

10X variation

~50% total power

# of Paths

# of Paths

Delay Target

Delay Target

Leakage Power

Probabilistic Design

Probability

Delay

Deterministic design techniques inadequate in the future

shift in design paradigm

Today:

Local Optimization

Single Variable

Tomorrow:

Global Optimization

Multi-variate

Shift in Design Paradigm
  • Multi-variable design optimization for:
    • Yield and bin splits
    • Parameter variations
    • Active and leakage power
    • Performance
adaptive body bias experiment

Resistor Network

PD & Counter

Delay

Bias Amplifier

CUT

Adaptive Body Bias--Experiment

Multiple

subsites

Resistor Network

5.3 mm

4.5 mm

1.6 X 0.24 mm, 21 sites per die

150nm CMOS

Technology

150nm CMOS

Number of

21

Die frequency: Min(F1..F21)

Die power: Sum(P1..P21)

subsites per die

0.5V FBB to

Body bias range

0.5V RBB

Bias resolution

32 mV

adaptive body bias results

noBB

ABB

within die ABB

97% highest bin

  • For given Freq and Power density
  • 100% yield with ABB
  • 97% highest freq bin with ABB for within die variability

100% yield

Adaptive Body Bias--Results

100%

60%

Accepted die

20%

0%

Higher Frequency 

design m arch efficiency
Design & mArch Efficiency

Employ efficient design & mArchitectures

memory latency
Memory Latency

CPU

Cache

Memory

Small

~few Clocks

Large

50-100ns

Assume: 50ns

Memory latency

Cache miss hurts performance

Worse at higher frequency

increase on die memory
Increase on-die Memory

Large on die memory provides:

Increased Data Bandwidth & Reduced Latency

Hence, higher performance for much lower power

multi threading
Multi-threading

Thermals & Power Delivery

designed for full HW utilization

Single Thread

Full HW Utilization

Wait for Mem

ST

Multi-Threading

Wait for Mem

MT1

Wait

MT2

MT3

Multi-threading improves performance without impacting thermals & power delivery

chip multi processing
Chip Multi-Processing

C1

C2

Cache

C3

C4

  • Multi-core, each core Multi-threaded
  • Shared cache and front side bus
  • Each core has different Vdd & Freq
  • Core hopping to spread hot spots
  • Lower junction temperature
special purpose hardware
Special Purpose Hardware

TCP Offload Engine

Opportunities:

Network processing engines

MPEG Encode/Decode engines

Speech engines

2.23 mm X 3.54 mm, 260K transistors

Special purpose HW—Best Mips/Watt

valued performance soc system on a chip
Special-purpose hardware  more MIPS/mm²

SIMD integer and FP instructions in several ISAs

Valued Performance: SOC (System on a Chip)

Heterogeneous

Si, SiGe, GaAs

Polylithic

Si Monolithic

Special

HW

Wireline

Opto-Electronics

CMOS RF

CPU

Memory

RF

Dense Memory

the exponential reward

Multi-Threaded, Multi-Core

Multi Threaded

Era of

Thread &

Processor

Level

Parallelism

Special Purpose HW

Speculative, OOO

Super Scalar

486

386

Era of Instruction

Level

Parallelism

286

8086

Era of Pipelined

Architecture

The Exponential Reward

Multi-everywhere: MT, CMP

summary delaying forever
Summary—Delaying Forever
  • Gigascale transistor integration capacity will be available—Power and Energy are the barriers
  • Variations will be even more prominent—shift from Deterministic toProbabilistic design
  • Improve design efficiency
  • Multi—everywhere, & SOC valued performance
  • Exploit integration capacity to deliver performance in power/cost envelope