PACT 98. Http://www.research.microsoft.com/barc/gbell/pact.ppt. What Architectures? Compilers? Run-time environments? Programming models? … Any Apps? Parallel Architectures and Compilers Techniques Paris, 14 October 1998. Gordon Bell Microsoft. Talk plan. Where are we today?
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
What Architectures? Compilers? Run-time environments? Programming models? … Any Apps?Parallel Architectures and Compilers TechniquesParis, 14 October 1998
Xpt connected SMPS
DSM- SCI (commodity)
DSM (high bandwidth_
Commodity “multis” & switches
Proprietary “multis”& switches
Multicomputers akaClusters … MPP
16-(64)- 10K processors
Jun-98TOP500 Technical Systems by Vendor (sans PC and mainframe clusters)
Over 5 Million NUs Requested
One NU = One XMP Processor-Hour
Source: National Resource Allocation Committee
WAGGB's Estimate of Parallelism in Engineering & Scientific Applications
Clusters aka MPPsaka multicomputers
log (# apps)
granularity & degree of coupling (comp./comm.)
General purpose, non-parallelizable codes(PCs have it!)
Vectorizable & //able(Supers & small DSMs)
Hand tuned, one-ofMPP course grainMPP embarrassingly //(Clusters of PCs...)
If central control & rich then IBM or large SMPs
else PC Clusters
WAG10 Processor Linpack (Gflops); 10 P appsx10; Apps % 1 P Linpack; Apps %10 P Linpack
1010/ 50 yrs = 1.5850
Thomas Watson Senior, Chairman of IBM, 1943
LLNL/IBM: 512x8 PowerPC (SP2)
Maui Supercomputer Center
512x1 SP2Our Tax Dollars At WorkASCI for Stockpile Stewardship
Navy Delphi Panel1969
Danny Hillis 1990 (1 paper or 1 company)
Petaflops / mo.
ATT/Columbia (Non Von), BBN Labs, Bell Labs/Columbia (DADO), CMU Warp (GE & Honeywell), CMU (Production Systems), Encore, ESL, GE (like connection machine), Georgia Tech, Hughes (dataflow), IBM (RP3), MIT/Harris, MIT/Motorola (Dataflow), MIT Lincoln Labs, Princeton (MMMP), Schlumberger (FAIM-1), SDC/Burroughs, SRI (Eazyflow), University of Texas, Thinking Machines (Connection Machine),
Alliant, American Supercomputer, Ametek, AMT, Astronautics, BBN Supercomputer, Biin, CDC (independent of ETA), Cogent, Culler, Cydrome, Dennelcor, Elexsi, ETA, Evans & Sutherland Supercomputers, Flexible, Floating Point Systems, Gould/SEL, IPM, Key, Multiflow, Myrias, Pixar, Prisma, SAXPY, SCS, Supertek (part of Cray), Suprenum (German National effort), Stardent (Ardent+Stellar), Supercomputer Systems Inc., Synapse, Vitec, Vitesse, Wavetracer.
Bandwagon: A propaganda device by which the purported acceptance of an idea ...is claimed in order to win further public acceptance.
Pullers: vendors, CS community
Pushers: funding bureaucrats & deficit
Riders: innovators and early adopters
4 flat tires: training, system software, applications, and "guideposts"
Spectators: most users, 3rd party ISVs
Our vision ... is a system of millions of hosts… in a loose confederation. Users will have the illusion of a very powerful desktop computer through which they can manipulate objects.
Grimshaw, Wulf, et al “Legion” CACM Jan. 1997
1 GF to 10 GF took 2 years
10 GF to 100 GF took 3 years
100 GF to 1 TF took >5 years
2n+1 or 2^(n-1)+1?
DOEAccelerated Strategic Computing Initiative (ASCI)
When is a Petaflops possible? What price?
Gordon Bell, ACM 1997
Performance Gap:(grows 50% / year)
1999Processor Limit: DRAM Gap
Size scalable -- designed from a few components, with no bottlenecks
Generation scaling -- no rewrite/recompile is requiredacross generations of computers
Geographic scaling -- compute anywhere (e.g. multiple sites or in situ workstation sites)
Problem x machine scalability -- ability of an algorithm or program to exist at a range of sizes that run efficiently on a given, scalable computer.
Problem x machine space => run time: problem scale, machine scale (#p), run time, implies speedup and efficiency,
WAGThe Law of Massive Parallelism (mine) is based on application scaling
There exists a problem that can be made sufficiently large such that any network of computers can run efficiently given enough memory, searching, & work -- but this problem may be unrelated to no other.
A ... any parallel problem can be scaled to run efficiently on an arbitrary network of computers, given enough memory and time… but it may be completely impractical
Challenge to theoreticians and tool builders:How well will or will an algorithm run?
Challenge for software and programmers: Can package be scalable & portable? Are there models?
Challenge to users: Do larger scale, faster, longer run times, increase problem insight and not just total flop or flops?
Challenge to funders: Is the cost justified?
“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft
192 HP 300 MHz
64 Compaq 333 MHz
Preconditioned Conjugate Gradient Method With
Multi-level Additive Schwarz Richardson Pre-conditioner
7 GF on
Danesh Tafti, Rob Pennington, NCSA; Andrew Chien (UIUC, UCSD)
“A source book for the history
of the future” -- Vint Cerf
“Dependable, consistent, pervasive access to
Symera (DCOM)Alliance Grid Technology Roadmap: It’s just not flops or records/se
A p p l i c a t i o n s
Diverse global svcs