pact 98 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
PACT 98 PowerPoint Presentation
Download Presentation
PACT 98

Loading in 2 Seconds...

play fullscreen
1 / 64

PACT 98 - PowerPoint PPT Presentation


  • 253 Views
  • Uploaded on

PACT 98. Http://www.research.microsoft.com/barc/gbell/pact.ppt. What Architectures? Compilers? Run-time environments? Programming models? … Any Apps? Parallel Architectures and Compilers Techniques Paris, 14 October 1998. Gordon Bell Microsoft. Talk plan. Where are we today?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'PACT 98' - Jims


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pact 98

PACT 98

Http://www.research.microsoft.com/barc/gbell/pact.ppt

gordon bell microsoft

What Architectures? Compilers? Run-time environments? Programming models? … Any Apps?Parallel Architectures and Compilers TechniquesParis, 14 October 1998

Gordon Bell

Microsoft

talk plan
Talk plan
  • Where are we today?
  • History… predicting the future
    • Ancient
    • Strategic Computing Initiative and ASCI
    • Bell Prize since 1987
    • Apps & architecture taxonomy
  • Petaflops: when, … how, how much
  • New ideas: Grid, Globus, Legion
  • Bonus: Input to Thursday panel
1998 isvs buyers users
1998: ISVs, buyers, & users?
  • Technical: supers dying; DSM (and SMPs) trying
    • Mainline: user & ISV apps ported to PCs & workstations
    • Supers (legacy code) market lives ...
    • Vector apps (e.g ISVs) ported to DSM (&SMP)
    • MPI for custom and a few, leading edge ISVs
    • Leading edge, one-of-a-kind apps: Clusters of 16, 256, ...1000s built from uni, SMP, or DSM
  • Commercial: mainframes, SMPs (&DSMs), and clusters are interchangeable (control is the issue)
    • Dbase & tp: SMPs compete with mainframes if central control is an issue else clusters
    • Data warehousing: may emerge… just a Dbase
    • High growth, web and stream servers: Clusters have the advantage
c2000 architecture taxonomy
c2000 Architecture Taxonomy

Xpt connected SMPS

Xpt-SMPvector

Xpt-multithread (Tera)

“multi”

Xpt-”multi” hybrid

DSM- SCI (commodity)

DSM (high bandwidth_

Commodity “multis” & switches

Proprietary “multis”& switches

Proprietary DSMs

mainline

SMP

Multicomputers akaClusters … MPP

16-(64)- 10K processors

mainline

top500 technical systems by vendor sans pc and mainframe clusters

500

Other

Japanese

DEC

400

Intel

TMC

Sun

HP

300

IBM

Convex

200

SGI

100

CRI

0

Jun-96

Nov-96

Jun-93

Nov-93

Jun-94

Nov-94

Jun-95

Nov-95

Jun-97

Nov-97

Jun-98

TOP500 Technical Systems by Vendor (sans PC and mainframe clusters)
parallelism of jobs on ncsa origin cluster

1%

3%

7%

6%

9%

2%

8%

9%

19%

40%

1

2

19%

21%

3-4

5-8

9-16

17%

17-32

5%

16%

33-64

18%

65-128

20 Weeks of Data, March 16 - Aug 2, 1998

15,028 Jobs / 883,777 CPU-Hrs

Parallelism of Jobs On NCSA Origin Cluster

by # of Jobs

by CPU Delivered

# CPUs

how are users using the origin array

120,000

100,000

80,000

CPU Hrs

60,000

Delivered

40,000

20,000

0-64

0

64-128

1

2

128-256

3-4

256-384

5-8

Mem/CPU

9-16

384-512

17-32

(MB)

512+

33-64

65-128

# CPUs

How are users using the Origin Array?
national academic community large project requests september 1998
National Academic Community Large Project Requests September 1998

Over 5 Million NUs Requested

One NU = One XMP Processor-Hour

Source: National Resource Allocation Committee

gb s estimate of parallelism in engineering scientific applications

Gordon’s

WAG

GB's Estimate of Parallelism in Engineering & Scientific Applications

----scalable multiprocessors-----

PCs

WSs

Supers

Clusters aka MPPsaka multicomputers

dusty decks

for supers

new or

scaled-up

apps

log (# apps)

scalar

60%

vector

15%

Vector& //

5%

One-of>>//

5%

Embarrassingly &

perfectly parallel

15%

granularity & degree of coupling (comp./comm.)

application taxonomy
Application Taxonomy

General purpose, non-parallelizable codes(PCs have it!)

Vectorizable

Vectorizable & //able(Supers & small DSMs)

Hand tuned, one-ofMPP course grainMPP embarrassingly //(Clusters of PCs...)

DatabaseDatabase/TP

Web Host

Stream Audio/Video

Technical

Commercial

If central control & rich then IBM or large SMPs

else PC Clusters

one procerssor perf as of linpack
One procerssor perf. as % of Linpack

22%

CFD

Biomolec.

Chemistry

Materials

QCD

25%

19%

14%

33%

26%

growth in computational resources used for uk weather forecasting
Growth in Computational Resources Used for UK Weather Forecasting

1010/ 50 yrs = 1.5850

10T •

1T •

100G •

10G •

1G •

100M •

10M •

1M •

100K •

10K •

1K •

100 •

10 •

YMP

205

195

KDF9

Mercury

Leo

1950

2000

i think there is a world market for maybe five computers
I think there is a world market for maybe five computers.

Thomas Watson Senior, Chairman of IBM, 1943

the scientific market is still about that size 3 computers
The scientific market is still about that size… 3 computers
  • When scientific processing was 100% of the industry a good predictor
  • $3 Billion: 6 vendors, 7 architectures
  • DOE buys 3 very big ($100-$200 M) machines every 3-4 years
our tax dollars at work asci for stockpile stewardship
Intel/Sandia: 9000x1 node Ppro

LLNL/IBM: 512x8 PowerPC (SP2)

LNL/Cray: ?

Maui Supercomputer Center

512x1 SP2

Our Tax Dollars At WorkASCI for Stockpile Stewardship
larc doesn t need 30 000 words von neumann 1955
“LARC doesn’t need 30,000 words!” --Von Neumann, 1955.
  • “During the review, someone said: “von Neumann was right. 30,000 word was too much IF all the users were as skilled as von Neumann ... for ordinary people, 30,000 was barely enough!” -- Edward Teller, 1995
  • The memory was approved.
  • Memory solves many problems!
in dec 1995 computers with 1 000 processors will do most of the scientific processing
In Dec. 1995 computers with 1,000 processors will do most of the scientific processing.

Danny Hillis 1990 (1 paper or 1 company)

the bell hillis bet massive parallelism in 1995
The Bell-Hillis BetMassive Parallelism in 1995

TMC

World-wide

Supers

TMC

World-wide

Supers

TMC

World-wide

Supers

Applications

Petaflops / mo.

Revenue

bell hillis bet wasn t paid off
Bell-Hillis Bet: wasn’t paid off!
  • My goal was not necessarily to just win the bet!
  • Hennessey and Patterson were to evaluate what was really happening…
  • Wanted to understand degree of MPP progress and programmability
darpa 1985 strategic computing initiative sci

DARPA, 1985 Strategic Computing Initiative (SCI)

A 50 X LISP machine

Tom Knight, Symbolics

A 1,000 node multiprocessorA Teraflops by 1995

Gordon Bell, Encore

èAll of ~20 HPCC projects failed!

sci c1980s strategic computing initiative funded
SCI (c1980s): Strategic Computing Initiative funded

ATT/Columbia (Non Von), BBN Labs, Bell Labs/Columbia (DADO), CMU Warp (GE & Honeywell), CMU (Production Systems), Encore, ESL, GE (like connection machine), Georgia Tech, Hughes (dataflow), IBM (RP3), MIT/Harris, MIT/Motorola (Dataflow), MIT Lincoln Labs, Princeton (MMMP), Schlumberger (FAIM-1), SDC/Burroughs, SRI (Eazyflow), University of Texas, Thinking Machines (Connection Machine),

those who gave up their lives in sci s search for parallellism
Those who gave up their lives in SCI’s search for parallellism

Alliant, American Supercomputer, Ametek, AMT, Astronautics, BBN Supercomputer, Biin, CDC (independent of ETA), Cogent, Culler, Cydrome, Dennelcor, Elexsi, ETA, Evans & Sutherland Supercomputers, Flexible, Floating Point Systems, Gould/SEL, IPM, Key, Multiflow, Myrias, Pixar, Prisma, SAXPY, SCS, Supertek (part of Cray), Suprenum (German National effort), Stardent (Ardent+Stellar), Supercomputer Systems Inc., Synapse, Vitec, Vitesse, Wavetracer.

worlton bandwagon effect explains massive parallelism
Worlton: "Bandwagon Effect"explains massive parallelism

Bandwagon: A propaganda device by which the purported acceptance of an idea ...is claimed in order to win further public acceptance.

Pullers: vendors, CS community

Pushers: funding bureaucrats & deficit

Riders: innovators and early adopters

4 flat tires: training, system software, applications, and "guideposts"

Spectators: most users, 3rd party ISVs

parallel processing is a constant distance away
Parallel processing is a constant distance away.

Our vision ... is a system of millions of hosts… in a loose confederation. Users will have the illusion of a very powerful desktop computer through which they can manipulate objects.

Grimshaw, Wulf, et al “Legion” CACM Jan. 1997

bell prize 1000x 1987 1998
Bell Prize: 1000x 1987-1998
  • 1987 Ncube 1,000 computers: showed with more memory, apps scaled
  • 1987 Cray XMP 4 proc. @200 Mflops/proc
  • 1996 Intel 9,000 proc. @200 Mflops/proc 1998 600 RAP Gflops Bell prize
  • Parallelism gains
    • 10x in parallelism over Ncube
    • 2000x in parallelism over XMP
  • Spend 2- 4x more
  • Cost effect.: 5x; ECL è CMOS; Sram è Dram
  • Moore’s Law =100x
  • Clock: 2-10x; CMOS-ECL speed cross-over
slide36
No more 1000X/decade.We are now (hopefully) only limited by Moore’s Law and not limited by memory access.

1 GF to 10 GF took 2 years

10 GF to 100 GF took 3 years

100 GF to 1 TF took >5 years

2n+1 or 2^(n-1)+1?

slide39

1998 Observations vs1989 Predictions for technical

  • Got a TFlops PAP 12/1996 vs 1995. Really impressive progress! (RAP<1 TF)
  • More diversity… results in NO software!
    • Predicted: SIMD, mC, hoped for scalable SMP
    • Got: Supers, mCv, mC, SMP, SMP/DSM,SIMD disappeared
  • $3B (un-profitable?) industry; 10 platforms
  • PCs and workstations diverted users
  • MPP apps DID NOT materialize
observation cmos supers replaced ecl in japan
Observation: CMOS supers replaced ECL in Japan
  • 2.2 Gflops vector units have dual use
    • In traditional mPv supers
    • as basis for computers in mC
  • Software apps are present
  • Vector processor out-performs n micros for many scientific apps
  • It’s memory bandwidth, cache prediction, and inter-communication
observation price performance
Observation: price & performance
  • Breaking $30M barrier increases PAP
  • Eliminating “state computers” increased prices, but got fewer, more committed suppliers, less variation, and more focus
  • Commodity micros aka Intel are critical to improvement. DEC, IBM, and SUN are ??
  • Conjecture: supers and MPPs may be equally cost-effective despite PAP
    • Memory bandwidth determines performance & price
    • “You get what you pay for ” aka “there’s no free lunch”
observation mpps 1 users 1
Observation: MPPs 1, Users <1
  • MPPs with relatively low speed micros with lower memory bandwidth, ran over supers, but didn’t kill ‘em.
  • Did the U.S. industry enter an abyss?
      • Is crying “Unfair trade” hypocritical?
      • Are users denied tools?
      • Are users not “getting with the program”
  • Challenge we must learn to program clusters...
      • Cache idiosyncrasies
      • Limited memory bandwidth
      • Long Inter-communication delays
      • Very large numbers of computers
strong recommendation utilize in situ workstations
Strong recommendation: Utilize in situ workstations!
  • NoW (Berkeley) set sort record, decrypting
  • Grid, Globus, Condor and other projects
  • Need “standard” interface and programming model for clusters using “commodity” platforms & fast switches
  • Giga- and tera-bit links and switches allow geo-distributed systems
  • Each PC in a computational environment should have an additional 1GB/9GB!
petaflops by 2010
Petaflops by 2010

DOEAccelerated Strategic Computing Initiative (ASCI)

doe s 1997 pathforward accelerated strategic computing initiative a sci
DOE’s 1997 “PathForward” Accelerated Strategic Computing Initiative (ASCI)
  • 1997 1-2 Tflops: $100M
  • 1999-2001 10-30 Tflops $200M??
  • 2004 100 Tflops
  • 2010 Petaflops
slide46

When is a Petaflops possible? What price?

  • Moore’s Law 100xBut how fast can the clock tick?
  • Increase parallelism 10K>100K 10x
  • Spend more ($100M è $500M) 5x
  • Centralize center or fast network 3x
  • Commoditization (competition) 3x

Gordon Bell, ACM 1997

micros gains if 20 40 60 year

60%= Exaops

40%= Petaops

20%= Teraops

Micros gains if 20, 40, & 60% / year

1.E+21

1.E+18

1.E+15

1.E+12

1.E +9

1.E+6

1995 2005 2015 2025 2035 2045

processor limit dram gap

µProc

60%/yr..

1000

CPU

100

Processor-Memory

Performance Gap:(grows 50% / year)

Performance

10

DRAM

7%/yr..

DRAM

1

1992

2000

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1993

1994

1995

1996

1997

1998

1999

Processor Limit: DRAM Gap

“Moore’s Law”

  • Alpha 21264 full cache miss / instructions executed: 180 ns/1.7 ns =108 clks x 4 or 432 instructions
  • Caches in Pentium Pro: 64% area, 88% transistors
  • *Taken from Patterson-Keeton Talk to SigMod
five scalabilities
Five Scalabilities

Size scalable -- designed from a few components, with no bottlenecks

Generation scaling -- no rewrite/recompile is requiredacross generations of computers

Reliability scaling

Geographic scaling -- compute anywhere (e.g. multiple sites or in situ workstation sites)

Problem x machine scalability -- ability of an algorithm or program to exist at a range of sizes that run efficiently on a given, scalable computer.

Problem x machine space => run time: problem scale, machine scale (#p), run time, implies speedup and efficiency,

the law of massive parallelism mine is based on application scaling

Gordon’s

WAG

The Law of Massive Parallelism (mine) is based on application scaling

There exists a problem that can be made sufficiently large such that any network of computers can run efficiently given enough memory, searching, & work -- but this problem may be unrelated to no other.

A ... any parallel problem can be scaled to run efficiently on an arbitrary network of computers, given enough memory and time… but it may be completely impractical

Challenge to theoreticians and tool builders:How well will or will an algorithm run?

Challenge for software and programmers: Can package be scalable & portable? Are there models?

Challenge to users: Do larger scale, faster, longer run times, increase problem insight and not just total flop or flops?

Challenge to funders: Is the cost justified?

manyflops for manybucks what are the goals of spending
Manyflops for Manybucks: what are the goals of spending?
  • Getting the most flops, independent of how much taxpayers give to spend on computers?
  • Building or owning large machines?
  • Doing a job (stockpile stewardship)?
  • Understanding and publishing about parallelism?
  • Making parallelism accessible?
  • Forcing other labs to follow?
or more parallelism and use installed machines
Or more parallelism… and use installed machines
  • 10,000 nodes in 1998 or 10x Increase
  • Assume 100K nodes
  • 10 Gflops/10GBy/100GB nodes or low end c2010 PCs
  • Communication is first problem… use the network
  • Programming is still the major barrier
  • Will any problems fit it
the alliance les nt supercluster
The Alliance LES NT Supercluster

“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft

  • Andrew Chien, CS UIUC-->UCSD
  • Rob Pennington, NCSA
  • Myrinet Network, HPVM, Fast Msgs
  • Microsoft NT OS, MPI API

192 HP 300 MHz

64 Compaq 333 MHz

2d navier stokes kernel performance
2D Navier-Stokes Kernel - Performance

Preconditioned Conjugate Gradient Method With

Multi-level Additive Schwarz Richardson Pre-conditioner

Sustaining

7 GF on

128 Proc.

NT Cluster

Danesh Tafti, Rob Pennington, NCSA; Andrew Chien (UIUC, UCSD)

slide57
The Grid:Blueprint for a New Computing InfrastructureIan Foster, Carl Kesselman (Eds),Morgan Kaufmann, 1999
  • Published July 1998;

ISBN 1-55860-475-8

  • 22 chapters by expert authors including:
    • Andrew Chien,
    • Jack Dongarra,
    • Tom DeFanti,
    • Andrew Grimshaw,
    • Roch Guerin,
    • Ken Kennedy,
    • Paul Messina,
    • Cliff Neuman,
    • Jon Postel,
    • Larry Smarr,
    • Rick Stevens,
    • Charlie Catlett
    • John Toole
    • and many others

“A source book for the history

of the future” -- Vint Cerf

http://www.mkp.com/grids

the grid
The Grid

“Dependable, consistent, pervasive access to

[high-end] resources”

  • Dependable: Can provide performance and functionality guarantees
  • Consistent: Uniform interfaces to a wide variety of resources
  • Pervasive: Ability to “plug in” from anywhere
alliance grid technology roadmap it s just not flops or records se

User Interface

Habanero

Cave5D

Workbenches

Webflow

NetMeeting

Tango

Virtual Director

H.320/323

VRML

Java3D

RealNetworks

Visualization

ActiveX

Java

SCIRun

Abilene

Middleware

vBNS

CAVERNsoft

Globus

LDAP

MREN

QoS

OpenMP

MPI

Compute

Emerge (Z39.50)

Data

HPF

DSM

Clusters

SANs

XML

DMF

Clusters

HPVM/FM

svPablo

ODBC

HDF-5

JavaGrande

SRB

Condor

Symera (DCOM)

Alliance Grid Technology Roadmap: It’s just not flops or records/se
globus approach
Globus Approach

A p p l i c a t i o n s

  • Focus on architecture issues
    • Propose set of core services as basic infrastructure
    • Use to construct high-level, domain-specific solutions
  • Design principles
    • Keep participation cost low
    • Enable local control
    • Support for adaptation

Diverse global svcs

Core Globus

services

Local OS

globus toolkit core services
Globus Toolkit: Core Services
  • Scheduling (Globus Resource Alloc. Manager)
    • Low-level scheduler API
  • Information (Metacomputing Directory Service)
    • Uniform access to structure/state information
  • Communications (Nexus)
    • Multimethod communication + QoS management
  • Security (Globus Security Infrastructure)
    • Single sign-on, key management
  • Health and status (Heartbeat monitor)
  • Remote file access(Global Access to Secondary Storage)
summary of some beliefs
Summary of some beliefs
  • 1000x increase in PAP has not been accompanied with RAP, insight, infrastructure, and use.
  • What was the PACT/$?
  • “The PC World Challenge” is to provide commodity, clustered parallelism to commercial and technical communities
  • Only comes true of ISVs believe and act
  • Grid etc. using world-wide resources, including in situ PCs is the new idea
pact 9863

PACT 98

Http://www.research.microsoft.com/barc/gbell/pact.ppt