Overview of HPC – Eye Towards Petascale Computing. Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center University of California San Diego. Topics. Supercomputing in General Supercomputers at SDSC Eye Towards Petascale Computing.
Scientific Computing Applications Group
San Diego Supercomputer Center
University of California San Diego
UNC-RENCITeraGrid: Integrating NSF Cyberinfrastructure
TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research.
Full power of a machine is used for a given scientific problem utilizing - CPUs, memory, interconnect, I/O performance
Enables the solution of problems that cannot otherwise be solved in a reasonable period of time - figure of merit time to solution
E.g moving from a two-dimensional to a three-dimensional simulation, using finer grids, or using more realistic models
Modest problems are tackled, often simultaneously, on a machine, each with less demanding requirements
Smaller or cheaper systems are used for capacity computing, where smaller problems are solved
Parametric studies or to explore design alternatives
The main figure of merit is sustained performance per unit cost
For a fixed problem size how does the time to solution vary with the number of processors
Run a fixed size problem and plot the speedup
When scaling of parallel codes is discussed it is normally strong scaling that is being referred to
How the time to solution varies with processor count with a fixed problem size per processor
Interesting for O(N) algorithms where perfect weak scaling is a constant time to solution, independent of processor count
Deviations from this indicate that either
The algorithm is not truly O(N) or
The overhead due to parallelism is increasing, or both
Weak scaling for Argon is shown. The smallest system size is 32,000 atoms, the largest 32,768,000. It can be seen that the scaling is very good, the time step increasing from 0.6s to 0.7s on going from 1 processor to 1024. This simulation is a direct test of the linked cell algorithm as it only requires short ranged forces, and so the results show it is behaving as expected.
Weak scaling for water. The time step increasing from 1.9 second on 1 processor, where the system size is 20,736 particles, to 3.9 on 1024 ( system size 21,233,664 ). Ewald terms must also be calculated in this case, but constraint forces must be calculated. These forces are short range and should scale as O(N); their calculation requires a large number of short messages to be sent, and some latency effects become appreciable.
SDSC’s focus: Apps in top two quadrants second on 1 processor, where the system size is 20,736 particles, to 3.9 on 1024 ( system size 21,233,664 ). Ewald terms must also be calculated in this case, but constraint forces must be calculated. These forces are short range and should scale as O(N); their calculation requires a large number of short messages to be sent, and some latency effects become appreciable.
Data Storage/Preservation Env
Extreme I/O Environment
SDSC Data Science Env
Data(Increasing I/O and storage)
Campus, Departmental and Desktop Computing
Traditional HEC Env
Compute (increasing FLOPS)
TeraGrid Linux Cluster
Blue Gene Data
Storage Area Network Disk
6PB capacity (~3PB used)
SDSC procured 1-rack system 12/04. Used initially for code evaluation and benchmarking; production 10/05. (LLNL system is 64 racks.)
SDSC rack has maximum ratio of I/O to compute nodes at 1:8 (LLNL’s is 1:64). Each of 128 I/O nodes in rack has 1 Gbps Ethernet connection => 16 GBps/rack potential.
3.1 Petascale Hardware
3.2 Petascale Software
When I talk about petaflop computing, what I have in mind is the longer-term perspective, the time when the HPC community enters the age of petascale computing.
What I mean is the time when you must achieve petaflop Rmax performance to make the TOP500 list. An intriguing question is, when will this happen?
If you do a straight-line extrapolation from today's TOP500 list, you come up with the year 2016. In any case, it's eight to 10 years from now, and we will have to master several challenges to reach the age of petascale computing.
Source: “Getting up to speed: The Future of Supercomputing”, NRC, 2004
Intel Paragon XP
Radar X section
Application Signature science approach: Operations needed to be carried out by the application collecting: number of op1, op2, and op3
Rate at which a machine can perform different operations collecting: rate op1, op2, op3
Convolution: Mapping of a machines performance (rates) to applications needed operations
where operator could be + or MAX depending on operation overlap
Execution time = operation1operation2operation3
rate op1 rate op2 rate op3Performance Modeling & Characterization