Bill Camp, Jim Tomkins & Rob Leland. How Much Commodity is Enough? the Red Storm Architecture. William J. Camp, James L. Tomkins & Rob Leland CCIM , Sandia National Laboratories Albuquerque, NM [email protected] Sandia MPPs (since 1987). 1987: 1024-processor nCUBE10 [512 Mflops]
& Rob Leland
1987: 1024-processor nCUBE10 [512 Mflops]
Net I/OA partitioned, scalable computing architecture
VolumeComputing domains at Sandia
Unix (Linux)Login Nodewith Unixenvironment
User sees a coherent, single system
640 Visualization Service & I/O Nodes
640 Visualization, Service& I/O Nodes
ASICNIC +RouterNode architecture
DRAM 1 (or 2) Gbyte or more
Six LinksTo OtherNodes in X, Y,
ASIC = ApplicationSpecific IntegratedCircuit, or a“custom chip”
Compute Node CabinetCPU Boards
FrontThor’s Hammer Cabinet Layout
Scalability - Full System Hardware and System Software
Usability - Required Functionality Only
Reliability - Hardware and System Software
Expense minimization- use commodity, high-volume parts
SURE poses Computer System Requirements:
-build on commodity
-leverage Open Source (e.g., Linux)
-Add to commodity selectively (in RS there is basically one truly custom part!)
-leverage experience with previous scalable supercomputers
Overall System Scalability - Complex scientific applications such as molecular dynamics, hydrodynamics & radiation transport should achieve scaled parallel efficiencies greater than 50% on the full system (~20,000 processors).
System Software Performance scales nearly perfectly with the number of processors to the full size of the computer (~30,000 processors). This means that System Software time (overhead) remains nearly constant with the size of the system or scales at most logarithmically with the system size.
- Full re-boot time scales logarithmically with the system size.
- Job loading is logarithmic with the number of processors.
- Parallel I/O performance is not sensitive to # of PEs doing I/O
- Communication Network software must be scalable.
- No connection-based protocols among compute nodes.
- Message buffer space independent of # of processors.
- Compute node OS gets out of the way of the application.
>Application Code Support:
Software that supports scalability of the Computer System
MPI Support for Full System Size
Parallel I/O Library
Tools that Scale to the Full Size of the Computer System
Full-featured LINUX OS support at the user interface
Reduces need for new capital investment
Reduces need for large new capital facilities
Sys AdminCplant on a slide
Sys AdminIA-32 Cplant on a slide
For most large scientific and engineering applications the performance is more determined by parallel scalability and less by the speed of individual CPUs.
There must be balance between processor, interconnect, and I/O performance to achieve overall performance.
To date, only a few tightly-coupled, parallel computer systems have been able to demonstrate a high level of scalability on a broad set of scientific and engineering applications.
Node Speed Rating(MFlops)
Network Link BW
0.04Let’s Compare Balance In Parallel Systems
Blue Gene Light**Red Storm*
Node Speed 5.6 GF 5.6 GF (1x)
Node Memory 0.25--.5 GB 2 (1--8 ) GB (4x nom.)
Network latency 7 msecs 2 msecs (2/7 x)
Network BW 0.28 GB/s 6.0 GB/s (22x)
BW Bytes/Flops 0.05 1.1 (22x)
Bi-Section B/F 0.0016 0.038 (24x)
#nodes/problem 40,000 10,000 (1/4 x)
*100 TF version of Red Storm
* * 360 TF version of BGL
Molecular dynamics problem
Efficiency ratio =
Cost ratio = 1.8
Average efficiency ratio over the five codes that consume >80% of Sandia’s cycles
Sandia’s top priority computing workload:
Random variation at small proc. counts
Large differential in efficiency at large proc. counts
Los Alamos’ Radiation transport code