PARALLEL PROCESSING. The NAS Parallel Benchmarks Daniel Gross Chen Haiout. NASA (NAS Devision). NASA (NAS Devision) Aims. NASA Advanced Supercomputing Division Develop, demonstrate, and deliver innovative computing capabilities to enable NASA projects and missions
The NAS Parallel Benchmarks
In this benchmark, 2-dimensional statistics are accumulated from a large number of Gaussian pseudo-random numbers. This problem requires almost no communication, in some sense this benchmark provides an estimate of the upper achievable limits for floating-point performance on a particular system.
It is called the scalar pentadiagonal (SP) benchmark. In this benchmark, multiple independent systems of non-diagonally dominant, scalar pentadiagonal equations are solved. A complete solution of the SP requires 400 iteration.
MG uses a multigrid method to compute the solution of the three-dimensional scalar Poisson equation.
This code is a good test of both short and long distance highly structured communication.
FT contains the computational kernel of a three dimensional FFT-based spectral method.
BT solve systems of equations resulting from an approximately factored finite difference discretization of the Navier-Stokes equations.
In September 1996 two medium-scale parallel systems called “Loki” and “Hyglac” were installed.
Each consisted of sixteen Pentium Pro (200 MHz) PCs with 16 Mbytes of memory and 3.2 and 2.5 Gbytes of disks per node, respectively. Each system was integrated using two fast Ethernet NICs in each node.
Both sites had performed a complex N-body gravitational simulation of 2 million particles using an advanced tree-code algorithm. Each of these systems achieved a sustained performance of 1.19 Gflops and 1.26 Gflops, respectively. When the systems were connected together The same code was run again and achieved a sustained capability of over 2 Gflops without further optimization of the code for this new configuration.
The hardware configuration of the Berkeley NOW (Network Of Workstation) system comprise 105 Sun Ultra 170 workstations connected by Myricom networks. Each node includes 167MHz Ultra 1 microprocessor with 512 KB cache, 128 MB of RAM, two 2.3 GB disk space.
The Cray T3E-1200 is a scalable shared-memory multiprocessor based on the DEC Alpha 21164 microprocessor. It provides a shared physical address space of up to 2048 processors over a 3D torus interconnect. Each node of the system contains an Alpha 21164 processor each of which is capable of 1200 Mflops. The system logic runs at 75 MHz, and the processor runs at some multiple of this, such as 600 MHz for Cray T3E-1200. Torus links provide a raw bandwidth of 650 MBps in each direction to maintain system balance with the faster processors and memory.
• Old PII at 300MHz processors –Will be
• 8 PIII at 450MHz processors
• 4 PIII at 733MHz processors
• The new machines:
– Dual AMD Athlon(tm) MP 2000+ @
1,666MHz. 1GB Memory.