the alpha 21364 and 21464 microprocessors continuing the performance lead beyond y2k l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K PowerPoint Presentation
Download Presentation
The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K

Loading in 2 Seconds...

play fullscreen
1 / 36

The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K - PowerPoint PPT Presentation


  • 669 Views
  • Uploaded on

The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury, Massachusetts

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K' - andrew


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the alpha 21364 and 21464 microprocessors continuing the performance lead beyond y2k

The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K

Shubu Mukherjee, Ph.D.

Principal Hardware Engineer

VSSAD Labs, Alpha Development Group

Compaq Computer Corporation

Shrewsbury, Massachusetts

Slides: 1998 Microprocessor Forum (Peter Bannon) and 1999 Microprocessor Forum (Joel Emer)

slide2

Alpha Microprocessor Roadmap

Higher Performance

0.125mm

0.18mm

0.35mm

21464

EV8

21364

EV7

21264 EV6

Lower Cost

0.125mm

0.28mm

21364

EV78

21264EV67

0.18mm

21264EV68

1998

1999

2000 2001 2002 2003

First System Ship

alpha 21264 microprocessor
Alpha 21264 Microprocessor
  • Architectural Features
    • First “Out-of-Order” Alpha
    • Four-wide superscalar
  • Performance
    • World’s Fastest Microprocessor (www.spec.org, 11/17/99)
    • 39 SPECINT95, 68 SPECFP95 @ 700 Mhz
      • Intel Pentium III @ 733 Mhz delivers 36 SPECINT95, 30 SPECFP95
slide4

Alpha Microprocessor Roadmap

Higher Performance

0.125mm

0.18mm

0.35mm

21464

EV8

21364

EV7

21264 EV6

Lower Cost

0.125mm

0.28mm

21364

EV78

21264EV67

0.18mm

21264EV68

1998

1999

2000 2001 2002 2003

First System Ship

slide5

Alpha 21364 Goals

  • Leadership single stream performance
    • Higher operating frequency
    • Integrated memory interface
  • Leadership multiprocessor performance
    • Integrated system / multiprocessor interface
slide6

Alpha 21364 Features

  • System-on-a-Chip
    • Alpha 21264 core with enhancements
    • Integrated L2 Cache
    • Integrated memory controller
    • Integrated network interface
  • Fault-Tolerance
    • Support for lock-step operation to enable high-availability systems.
slide7

Address In

Address Out

R

A

M

B

U

S

L2

Cache

Memory

Controller

N

S

E

WI/O

Network

Interface

16 L1

Victim Buf

16 L2

Victim Buf

21364 Chip Block Diagram

21264

Core

16 L1

Miss Buffers

64K Icache

64K Dcache

slide8

21364 Core

FETCH MAP QUEUE REG EXEC DCACHE

Stage: 0 1 2 3 4 5 6

Int Reg

Map

Int Issue Queue

(20)

Branch

Predictors

Reg File

(80)

Exec

L2 cache1.5MB

6-Set

Addr

Exec

L1 Data

Cache

64KB

2-Set

Reg File

(80)

Exec

80 in-flight instructions

plus 32 loads and 32 stores

Addr

Exec

Next-Line

Address

4 Instructions / cycle

L1 Ins.

Cache

64KB

2-Set

FP ADD

Div/Sqrt

Reg File

(72)

FP Issue Queue

(15)

FP Reg

Map

Victim Buffer

FP MUL

Miss Address

slide9

Integrated L2 Cache

  • 1.5 MB
  • 6-way set associative
  • 16 GB/s total read/write bandwidth
  • 16 Victim buffers for L1 -> L2
  • 16 Victim buffers for L2 -> Memory
  • ECC SECDED code
  • 12ns load to use latency
slide10

Integrated Memory Controller

  • Direct RAMbus
    • High data capacity per pin
    • 800 MHz operation
    • 30ns CAS latency pin to pin
  • 6 GB/sec read or write bandwidth
  • 100s of open pages
  • Directory based cache coherence
  • ECC SECDED
slide11

Integrated Network Interface

  • Direct processor-to-processor interconnect
  • 10 GB/second per processor
  • 15ns processor-to-processor latency
  • Out-of-order network with adaptive routing
  • Asynchronous clocking between processors
  • 3 GB/second I/O interface per processor
slide12

IO

M

IO

M

M

IO

M

M

IO

IO

IO

M

IO

M

IO

M

IO

M

IO

M

IO

M

IO

M

364

364

364

364

364

364

364

364

364

364

364

364

21364 System Block Diagram

slide13

Alpha 21364 Technology

  • 0.18 mm CMOS
  • 1000+ MHz
  • 100 Watts @ 1.5 volts
  • 3.5 cm2
  • 6 Layer Metal
  • 100 million transistors
    • 8 million logic
    • 92 million RAM
slide14

Alpha 21364 Status

  • 70 SPECint95 (estimated)
  • 120 SPECfp95 (estimated)
  • RTL model running
  • Tapeout: Summer 2000
slide15

21364 Summary: System on a Chip

  • Integrated L2 cache and memory controller
    • outstanding single processor performance
  • Integrated network interface
    • high performance multi-processor systems
    • scales to large number of processors
slide16

Alpha Microprocessor Overview

Higher Performance

0.125mm

0.18mm

0.35mm

21464

EV8

21364

EV7

21264 EV6

Lower Cost

0.125mm

0.28mm

21364

EV78

21264EV67

0.18mm

21264EV68

1998

1999

2000 2001 2002 2003

First System Ship

slide17

Alpha 21464 Goals

  • Leadership single stream performance
    • Higher operating frequency / better technology
    • New microarchitecture
    • Integrated memory interface (like 21364)
  • Leadership multiprocessor performance
    • Simultaneous Multithreading (with minimal change/cost)
    • Integrated system / multiprocessor interface (like 21364)
alpha 21464 technology overview
Alpha 21464 Technology Overview
  • Leading edge process technology – 1.2-2.0GHz
    • 0.125µm CMOS
    • SOI-compatible
    • Cu interconnect
    • low-k dielectrics
  • Chip characteristics
    • ~1.2V Vdd
    • ~250 Million transistors
alpha 21464 architecture overview
Alpha 21464 Architecture Overview
  • Enhanced out-of-order execution
  • 8-wide superscalar
  • Large on-chip L2 cache
  • Direct RAMBUS interface
  • On-chip router for system interconnect
  • Glueless, directory-based, ccNUMA
    • for up to 512-way multiprocessing
  • 4-way simultaneous multithreading (SMT)
instruction issue
Instruction Issue

Time

Reduced function unit utilization due to dependencies

superscalar issue
Superscalar Issue

Time

Superscalar leads to more performance, but lower utilization

predicated issue
Predicated Issue

Time

Adds to function unit utilization, but results are thrown away

chip multiprocessor
Chip Multiprocessor

Time

Limited utilization when only running one thread

fine grained multithreading
Fine Grained Multithreading

Time

Intra-thread dependencies still limit performance

simultaneous multithreading
Simultaneous Multithreading

Time

Maximum utilization of function units by independent operations

basic out of order pipeline

Decode/Map

Queue

Fetch

Reg Read

Execute

Dcache/Store Buffer

Reg Write

Retire

PC

RegisterMap

Regs

Regs

Dcache

Icache

Basic Out-of-order Pipeline

Thread-blind

smt pipeline

Decode/Map

Queue

Fetch

Reg Read

Execute

Dcache/Store Buffer

Reg Write

Retire

PC

RegisterMap

Regs

Regs

SMT Pipeline

Dcache

Icache

changes for smt
Changes for SMT
  • Basic pipeline – unchanged
  • Replicated resources
    • Program counters
    • Register maps
  • Shared resources
    • Register file (size increased)
    • Instruction queue
    • First and second level caches
    • Translation buffers
    • Branch predictor
architectural abstraction

TPU 0

TPU1

TPU2

TPU3

Icache

TLB

Dcache

Scache

Architectural Abstraction
  • 1 Processor with 4 Thread Processing Units (TPUs)
  • Shared hardware resources
21464 system block diagram

IO

M

IO

IO

M

IO

IO

M

M

M

M

IO

IO

M

M

IO

M

IO

0

1

2

3

21464 System Block Diagram

EV8

EV8

EV8

EV8

EV8

EV8

EV8

EV8

EV8

slide34

Alpha 21464 Summary

  • Leadership single stream performance
    • Higher operating frequency / better technology
    • New microarchitecture
    • Integrated memory interface (like 21364)
  • Leadership multiprocessor performance
    • Simultaneous Multithreading (with minimal changes/cost)
    • Integrated system / multiprocessor interface (like 21364)
slide35

Maintain Performance Lead Beyond Y2K

  • Alpha 21364
    • Reuses 21264 microprocessor core
    • System on a chip
  • Alpha 21464
    • New microarchitecture
    • System on a chip
    • Simultaneous Multithreading
my current research beyond 21464
My Current Research: Beyond 21464?
  • The Truth Project (w/ Joel Emer)
    • Examines different microarchitectural issues
  • The Multinet Project (w/ Rick Kessler)
    • Tightly-coupled multiprocessor networks
  • The Reliant Project (w/ Steve Reinhardt)
    • Self-Checking Microprocessors using SMT, ISCA submission
  • Asim (w/ VSSAD Labs)
    • Performance Model for Alphas beyond 21464