WSCCG - Thomas Sterling
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Hybrid Technology Petaflops System PowerPoint PPT Presentation


  • 368 Views
  • Uploaded on
  • Presentation posted in: General

Superconducting Section 100 GHz Processors. P 0. P n. P n-1. CRAM 0. CRAM n. CRAM n-1. INTERCONNECT. L. i. q. u. i. d. N. R. e. g. i. m. e. Liquid N 2 Region. 2. Buffer. Buffer. Buffer. SRAM. SRAM. SRAM. OPTICAL PACKET SWITCH. DRAM. DRAM. DRAM. OPTICAL STORAGE.

Download Presentation

Hybrid Technology Petaflops System

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hybrid technology petaflops system

WSCCG - Thomas Sterling


Hybrid technology petaflops system

Superconducting Section

100 GHz Processors

P0

Pn

Pn-1

CRAM0

CRAMn

CRAMn-1

INTERCONNECT

L

i

q

u

i

d

N

R

e

g

i

m

e

Liquid N2 Region

2

Buffer

Buffer

Buffer

SRAM

SRAM

SRAM

OPTICAL PACKET SWITCH

DRAM

DRAM

DRAM

OPTICAL STORAGE

Hybrid Technology Petaflops System

  • New device technologies

  • New component designs

  • New subsystem architecture

  • New system architecture

  • New latency management paradigm and mechanisms

  • New algorithms/applications

  • New compile time and runtime software

WSCCG - Thomas Sterling


Hybrid technology petaflops system

Complementing Technologies Yield Superior Power/Price/Performance

Sense Amps

Sense Amps

Memory

Stack

Memory

Stack

Decode

Basic

Silicon

Macro

Sense Amps

Sense Amps

Node Logic

Sense Amps

Sense Amps

Memory

Stack

Memory

Stack

Sense Amps

Sense Amps

Single

Chip

Superconductor RSFQ logic provides X100 Performance

Processor in Memory (PIM) High Memory Bandwidth and Low Power

Data Vortex Optical Communication. Very High Bi-section Bandwidth with Low Latency

Holographic Storage High Capacity with Low Power at Moderate speeds

WSCCG - Thomas Sterling


Hybrid technology petaflops system

DIVA PIM Smart Memory for Irregular Data Structures and Dynamic Databases

  • Processor in Memory

  • Merges memory & logic on single chip

  • Exploits high internal memory bandwidth

  • Enables row-wide in-place memory operations

  • Reduces memory access latencies

  • Significant power reduction

  • Efficient fine grain parallel processing

  • DIVA PIM Project

  • DARPA sponsored $12.2M USC ISI prime with Caltech ($2.4M over 4 years), Notre Dame, U of Del

  • Greatly accelerate scientific computing of irregular data structures and commercial dynamic databases

  • 0.25 m 256 Mbit part delivered 4Qtr 00

  • 4 processor/memory nodes

  • Key innovation of Multithreaded execution for high efficiency through latency management

  • Active message driven object oriented computation

  • Direct PIM to PIM interaction without host processor intervention

WSCCG - Thomas Sterling


Htmt percolation model

HTMT Percolation Model

CRYOGENIC AREA

DMA to CRAM

Split-Phase

Synchronization

to SRAM

start

done

C-Buffer

A-Queue

I-Queue

Parcel

Dispatcher

&

Dispenser

Parcel

Assembly

&

Disassembly

Parcel

Invocation

&

Termination

Re-Use

D-Queue

T-Queue

Run Time System

SRAM-PIM

DMA to DRAM-PIM

WSCCG - Thomas Sterling


From toys to teraflops bridging the beowulf gap

From Toys to Teraflops Bridging the Beowulf Gap

Thomas Sterling

California Institute of Technology

NASA Jet Propulsion Laboratory

September 3, 1998


Death of commercial high end parallel computers

Death of Commercial High-End Parallel Computers?

  • No market for high end computers

    • minimum growth in last five years

  • The Great Extinction

    • KSR, Alliant, TMC, Intel, CRI, CCC, Multiflow, Maspar, BBN, Convex, ...

  • Must use COTS

    • fabrication costs skyrocketing

    • development lead times too short

  • Federal Agencies Fleeing

    • NSF, DARPA, NIST, NIH

  • No New Good IDEAS

WSCCG - Thomas Sterling


Beowulf class systems

BEOWULF-CLASS SYSTEMS

  • Cluster of PCs

    • Intel x86

    • DEC Alpha

    • Mac Power PC

  • Pure M2COTS

  • Unix-like O/S with source

    • Linux, BSD, Solaris

  • Message passing programming model

    • PVM, MPI, BSP, homebrew remedies

  • Single user environments

  • Large science and engineering applications

WSCCG - Thomas Sterling


Emergence of beowulf clusters

Emergence of Beowulf Clusters

WSCCG - Thomas Sterling


Focus tasks for beowulf r d

Focus Tasks for Beowulf R&D

  • Applications

  • Scalability to high end

  • Low level enabling software technology

  • Grendel: Middle-ware for managing ensembles

  • Technology transfer

WSCCG - Thomas Sterling


Beowulf at work

Beowulf at Work

WSCCG - Thomas Sterling


Beowulf scalability

Beowulf Scalability

WSCCG - Thomas Sterling


Hybrid technology petaflops system

A 10 Gflops Beowulf

Center for Advance Computing Research

172 Intel Pentium Pro microprocessors

California Institute of Technology

WSCCG - Thomas Sterling


Hybrid technology petaflops system

Avalon architecture and price.

WSCCG - Thomas Sterling


Hybrid technology petaflops system

The Back-Ground

WSCCG - Thomas Sterling


Network topology scaling

Network Topology Scaling

Latencies (s)

WSCCG - Thomas Sterling


Petaflops clusters at powr

Petaflops Clusters at POWR

David H. Bailey *

James Bieda

Remy Evard

Robert Clay

Al Geist

Carl Kesselman

David E. Keyes

Andrew Lunsdaine

James R. McGraw

Piyush Mehrotra

Daniel Savarese

Bob Voigt

Michael S. Warren

WSCCG - Thomas Sterling


Critical system software

Critical System Software

  • A cluster node Unix-based OS (i.e. Linux or the like), scalable to 12,500+ nodes.

  • Fortran-90, C and C++ compilers, generating maximum performance object code, usable under the Linux OS.

  • An efficient implementation of MPI, scalable to 12,500+ nodes.

  • System management and job management tools, usable for systems of this size.

WSCCG - Thomas Sterling


System software research tasks

System Software Research Tasks

  • Can a stripped down Linux-like operating system be designed that is scalable to 12,500+ nodes?

  • Can vendor compilers be utilized in a Linux node environment?

  • If not, can high-performance Linux-compatible compilers be produced by third party vendors, keyed to needs of scientific computing?

  • Can MPI be scaled to 12,500+ nodes?

  • Can system management and batch submission (i.e. PBS or LSF) tools be scaled to 12,500+ nodes?

  • Can an effective performance management tool be produced for systems with 12,500+ nodes?

  • Can an effective debugger be produced for systems with 12,500+ nodes? Can the debugger being specified by the Parallel Tools consortium be adapted for these systems?

WSCCG - Thomas Sterling


Technology transfer

Technology Transfer

  • Information-hungry neo-users

    • how to implement

    • how to maintain

    • how to apply

  • Web based assembly and how-to information

  • Redhat CD-ROM including Extreme Linux

  • Tutorials

  • MIT Press book: “How to Build a Beowulf”

  • DOE and NASA workshops

    • JPC4: joint personal computer cluster computing conference

  • so many talks

WSCCG - Thomas Sterling


Godzilla meets bambi nt versus linux

Godzilla meets BambiNT versus Linux

  • Not in competition, complements each other

  • Linux was not created by suits

    • created by people who wanted to create it

    • distributed by people who wanted to share it

    • used by people who want to use it

  • If Linux dies

    • it will not be killed by NT

    • it will be buried by Linux users

  • Linux provides

    • Unix-like O/S which has been mainstream of scientific computing

    • Open source code

    • Low/no cost

WSCCG - Thomas Sterling


Have to run big problems on big machines

Have to Run Big Problems on Big Machines?

  • Its work, not peak flops

  • A user’s throughput over application cycle

  • Big machines yield little slices

    • due to time and space sharing

  • But data set memory requirements

    • wide range of data set needs, three order of magnitude

    • latency tolerant algorithms enable out-of-core computation

  • What is Beowulf breakpoint for price-performance?

WSCCG - Thomas Sterling


Hybrid technology petaflops system

WSCCG - Thomas Sterling


Alternative apis

Alternative APIs

  • Mostly MPI

  • PVM, also

  • custom messaging for performance

  • BSP

    • SPMD, global name space, implicit messaging

  • Hrunting

    • software supported distributed shared memory

  • EARTH

    • Guang Gao, Un. Of Delaware

    • software supported multithreaded

WSCCG - Thomas Sterling


Grendel suite

Grendel Suite

  • Targets effective management of ensembles

  • Embraces “NIH” (nothing in-house)

  • Surrogate customer for Beowulf community

  • Borrow software products from research projects

  • Capabilities required:

    • communication layers

    • numerical libs

    • program development tools

    • scheduling and runtime

    • debug and availability

    • external I/O

    • secondary/mass storage

    • general system admin

WSCCG - Thomas Sterling


Towards the future what can we expect

Towards the Future:what can we expect

  • 2 GFLOPS peak processors

  • $1000 per processor

  • 1 Gbps at < $250 per port

  • new backplane performance e.g. PCI++

  • Light-weight communications, < 10 usec latency

  • Optimized math libraries

  • 1 Gbyte main memory per node

  • 24 Gbyte disk storage per node

  • defecto standardized middle-ware

WSCCG - Thomas Sterling


Hybrid technology petaflops system

WSCCG - Thomas Sterling


Million teraflops beowulf

Million $$ Teraflops Beowulf?

  • Today, $3M peak Tflops

  • < year 2002 $1M peak Tflops

  • Performance efficiency is serious challenge

  • System integration

    • does vendor support of massive parallelism have to mean massive markup

  • System administration, boring but necessary

  • Maintenance without vendors; how?

    • New kind of vendors for support

  • Heterogeneity will become major aspect

WSCCG - Thomas Sterling


Hybrid technology petaflops system

WSCCG - Thomas Sterling


  • Login