iram a microprocessor for the post pc era
Download
Skip this Video
Download Presentation
IRAM: A Microprocessor for the Post-PC Era

Loading in 2 Seconds...

play fullscreen
1 / 23

IRAM: A Microprocessor for the Post-PC Era - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

IRAM: A Microprocessor for the Post-PC Era. David A. Patterson. http://cs.berkeley.edu/~patterson/talks [email protected] EECS, University of California Berkeley, CA 94720-1776. Perspective on Post-PC Era. PostPC Era will be driven by 2 technologies: 1) Mobile Consumer Devices

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'IRAM: A Microprocessor for the Post-PC Era' - efuru


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
iram a microprocessor for the post pc era

IRAM: A Microprocessor for the Post-PC Era

David A. Patterson

http://cs.berkeley.edu/~patterson/talks

[email protected]

EECS, University of California

Berkeley, CA 94720-1776

perspective on post pc era
Perspective on Post-PC Era
  • PostPC Era will be driven by 2 technologies:

1) Mobile Consumer Devices

    • e.g., successor to PDA, cell phone, wearable computers

2) Infrastructure to Support such Devices

    • e.g., successor to Big Fat Web Servers, Database Servers
a better media for mobile multimedia mpus logic dram
A Better Media for Mobile Multimedia MPUs: Logic+DRAM
  • Crash of DRAM market inspires new use of wafers
  • Faster logic in DRAM process
    • DRAM vendors offer faster transistors + same number metal layers as good logic [email protected] ≈ 20% higher cost per wafer?
  • Called Intelligent RAM (“IRAM”) since most of transistors will be DRAM
iram vision statement
I/O

I/O

D

R

A

M

I/O

I/O

f

a

b

D

R

A

M

D

R

A

M

IRAM Vision Statement

L

o

g

i

c

f

a

b

Proc

$

$

Microprocessor & DRAM on a single chip:

  • on-chip memory latency 5-10X, bandwidth 50-100X
  • improve energy efficiency 2X-4X (no off-chip bus)
  • serial I/O 5-10X v. buses
  • smaller board area/volume
  • adjustable memory size/width

L2$

Bus

Bus

Proc

Bus

potential multimedia architecture
Potential Multimedia Architecture
  • “New” model: VSIW=Very Short Instruction Word!
    • Compact: Describe N operations with 1 short instruct.
    • Predictable (real-time) performance vs. statistical performance (cache)
    • Multimedia ready: choose N*64b, 2N*32b, 4N*16b
    • Easy to get high performance
    • Compiler technology already developed, for sale!
      • Don’t have to write all programs in assembly language
revive vector vsiw architecture
Cost: ≈ $1M each?

Low latency, high BW memory system?

Code density?

Compilers?

Performance?

Power/Energy?

Limited to scientific applications?

Single-chip CMOS MPU/IRAM

IRAM

Much smaller than VLIW

For sale, mature (>20 years)

Easy scale speed with technology

Parallel to save energy, keep perf

Multimedia apps vectorizable too: N*64b, 2N*32b, 4N*16b

Revive Vector (= VSIW) Architecture!
v iram1 0 18 m fast logic 200 mhz 1 6 gflops 64b 6 4 gops 16b 16mb
I/O

I/O

I/O

I/O

V-IRAM1: 0.18 µm, Fast Logic, 200 MHz1.6 GFLOPS(64b)/6.4 GOPS(16b)/16MB

4 x 64

or

8 x 32

or

16 x 16

+

x

2-way Superscalar

Vector

Instruction

÷

Processor

Queue

Load/Store

Vector Registers

16K I cache

16K D cache

4 x 64

4 x 64

Serial

I/O

Memory Crossbar Switch

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

4 x 64

4 x 64

4 x 64

4 x 64

4 x 64

M

M

M

M

M

M

M

M

M

M

tentative viram 1 floorplan
C

P

U+$

4 Vector Pipes/Lanes

Tentative VIRAM-1 Floorplan
  • 0.18 µm DRAM16-32 MB in 16 banks x 256b
  • 0.18 µm, 5 Metal Logic
  • ≈ 200 MHz MIPS IV, 16K I$, 16K D$
  • ≈ 4 200 MHz FP/int. vector units
  • die: ≈ 20x20 mm
  • xtors: ≈ 130-250M
  • power: ≈2 Watts

Memory(128 Mbits / 16 MBytes)

Ring-

based

Switch

I/O

Memory(128 Mbits / 16 MBytes)

viram 1 simulated performance
VIRAM-1 Simulated Performance

Kernel GOPS % Peak Cycles/pixel (small=fast)

16b VIRAM MMX TMS‘C82

Compositing 6.40 100% 0.13 -- --

16b iDCT 3.10 48% 0.75 3.75 5.70

32b ColorConversion 2.95 92% 0.78 8.00 --

32b Convolution 3.16 99% 1.21 5.49 6.50

32b FP Matrix Multiply 3.19 97% -- -- --

tentative viram 0 25 floorplan
C

P

U+$

Tentative VIRAM-”0.25” Floorplan

Kernel GOPS

V-1 V-0.25

Comp. 6.40 1.6

iDCT 3.10 0.8

Clr.Conv. 2.95 0.8

Convol. 3.16 0.8

FP Matrix 3.19 0.8

  • Demonstrate scalability via 2nd layout (automatic from 1st)
  • 8 MB in 2 banks x 256b, 32 subbanks
  • ≈ 200 MHz CPU, 8K I$, 8K D$
  • 1 ≈ 200 MHz FP/int. vector units
  • die: ≈ 5 x 20 mm
  • xtors: ≈ 70M
  • power: ≈0.5 Watts

Memory

(32 Mb /

4 MB)

1 VU

Memory

(32 Mb /

4 MB)

v iram 1 tentative plan
V-IRAM-1 Tentative Plan
  • Phase I: Feasibility stage (≈H2’98)
    • Test chip, CAD agreement, architecture defined
  • Phase 2: Design & Layout Stage (≈’99)
    • Test chip, Simulated design and layout
  • Phase 3: Verification (≈1Q’00)
    • Tape-out Q2’00
  • Phase 4: Fabrication,Testing, and Demonstration (≈3Q’00)
    • Functional integrated circuit
  • 100M transistor microprocessor before Intel?
iram not a new idea
SIMD on chip (DRAM)

Uniprocessor (SRAM)

MIMD on chip (DRAM)

Uniprocessor (DRAM)

MIMD component (SRAM )

Terasys

Bits of Arithmetic Unit

IRAM not a new idea

1000

IRAMUNI?

IRAMMPP?

Stone, ‘70 “Logic-in memory”

Barron, ‘78 “Transputer”

Dally, ‘90 “J-machine”

Patterson, ‘90 panel session

Kogge, ‘94 “Execube”

PPRAM

100

Mitsubishi M32R/D

PIP-RAM

Computational RAM

Mbits

of

Memory

10

Pentium Pro

Execube

1

Alpha 21164

Transputer T9

0.1

10

10000

1000

100

iram chip challenges
IRAM Chip Challenges
  • Merged Logic-DRAM process: Cost of wafer, Impact on yield, testing cost of logic and DRAM
  • Price of on-chip DRAM vs. separate DRAM chips?
  • Time delay of transistor speeds, memory cell sizes in Merged process vs. Logic only or DRAM only
  • DRAM block: flexibility via DRAM “compiler” (very size, width, no. subbanks) vs. fixed block;
    • synchronous interface available?
  • Applications: advantages in memory bandwidth, energy, system size to offset above challenges?
sony playstation 2000
Sony Playstation 2000
  • Emotion Engine: 6.2 GFLOPS, 75 million polygons per second (Microprocessor Report, 13:5)
    • Superscalar MIPS core + vector coprocessor + graphics/DRAM
    • Claim: Toy Story realism brought to games!
infrastructure for next generation
Infrastructure for Next Generation
  • Servers today based on desktop MPUs: Central Processsor Units + Peripheral Disks
  • What would servers look like if based on mobile, multimedia microprocessors?
  • Include processor, network interface inside disk
  • ISTORE: a HW/software architecture for building scaleable, self-maintaining storage
    • An introspective system: processor/disk it monitors itself and acts on its observations
    • No administrators to configure, monitor, tune
istore i hardware
CPU, memory, NI

Intelligent Chassis: scaleable, redundant, fast network + UPS

Device

Intelligent Disk “Brick”: a disk, plus a fast embedded CPU, memory, and redundant network interfaces

ISTORE-I Hardware
  • ISTORE uses “intelligent” hardware
iram conclusion
IRAM Conclusion
  • IRAM potential in mem/IO BW, energy, board area; challenges in power/performance, testing, yield
  • 10X-100X improvements based on technology shipping for 20 years (not JJ, photons, MEMS, ...)
  • Suppose IRAM is successful
  • Revolution in computer implementation
    • Potential Impact #1: turn server industry inside-out?
  • Potential #2: shift semiconductor balance of power?

Who ships the most memory? Most microprocessors?

acknowledgments
Acknowledgments
  • Looking for ideas of VIRAM enabled apps
  • Contact us if you’re interested:email: [email protected]://iram.cs.berkeley.edu/
  • Thanks for advice/support: DARPA, California MICRO, Hitachi, IBM, Intel, LG Semicon, Microsoft, Neomagic, Sandcraft, SGI/Cray, Sun Microsystems, TI, TSMC
backup slides
Backup Slides

(The following slides are used to help answer questions)

commercial iram highway is governed by memory per iram
32 MB

8 MB

2 MB

Commercial IRAM highway is governed by memory per IRAM?

Laptop

Network Computer

Super PDA/Phone

Video Games

Graphics Acc.

near term iram applications
Near-term IRAM Applications
  • “Intelligent” Set-top
    • 2.6M Nintendo 64 (≈ $150) sold in 1st year
    • 4-chip Nintendo 1-chip: 3D graphics, sound, fun!
  • “Intelligent” Personal Digital Assistant
    • 0.6M PalmPilots (≈ $300) sold in 1st 6 months
    • Handwriting + learn new alphabet ( = K, = T, = 4) v. Speech input
words to remember
Words to Remember

“...a strategic inflection point is a time in the life of a business when its fundamentals are about to change. ... Let's not mince words: A strategic inflection point can be deadly when unattended to. Companies that begin a decline as a result of its changes rarely recover their previous greatness.”

  • Only the Paranoid Survive, Andrew S. Grove, 1996
2006 istore
2006 ISTORE
  • IBM MicroDrive
    • 1.7” x 1.4” x 0.2”
    • 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek
    • 2006: 9 GB, 50 MB/s?
  • ISTORE node
    • MicroDrive + IRAM
  • Crossbar switches growing by Moore’s Law
    • 16 x 16 in 1999  64 x 64 in 2005
  • ISTORE rack (19” x 33” x 84”)
    • 1 tray (3” high)  16 x 32  512 ISTORE nodes
    • 20 trays+switches+UPS 10,240 ISTORE nodes(!)
ad