Benchmarks for parallel systems
Download
1 / 21

Benchmarks for Parallel Systems - PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on

Benchmarks for Parallel Systems. Sources/Credits:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Benchmarks for Parallel Systems' - wendi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Benchmarks for parallel systems

Benchmarks for Parallel Systems

Sources/Credits:

“Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University of Tennessee, Knoxville TN, 37996, Computer Science Technical Report Number CS - 89 – 85, April 8, 2004, url:http://www.netlib.org/benchmark/performance.ps

http://www.top500.org

FAQ: http://www.netlib.org/utk/people/JackDongarra/faq-linpack.html

Courtesy: Jack Dongarra (Top500)

http://www.top500.org

The LINPACK Benchmark: Past, Present, and Future, Jack Dongarra, Piotr Luszczek, and Antoine Petitet

NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/


Linpack dongarra 1979
LINPACK (Dongarra: 1979)

  • Dense system of linear equations

  • Initially used as a user’s guide for LINPACK package

  • LINPACK – 1979

  • N=100 benchmark, N=1000 benchmark, Highly Parallel Computing benchmark


Linpack benchmark
LINPACK benchmark

  • Implemented on top of BLAS1

  • 2 main operations – DGEFA(Gaussian elimination - O(n3)) and DGESL(Ax = b – O(n2))

  • Major operation (97%) – DAXPY: y = y + α.x

  • Called n3/3 + n2 times. Hence 2n3/3 + 2n2 flops (approx.)

  • 64-bit floating point arithmetic


Linpack
LINPACK

  • N=100, 100x100 system of equations. No change in code. User asked to give a timing routine called SECOND, no compiler optimizations

  • N=1000, 1000x1000 – user can implement any code, should provide the required accuracy: Towards Peak Performance (TPP). Driver program always uses 2n3/3 +2n2

  • “Highly Parallel Computing” benchmark – any software, matrix size can be chosen. Used in Top500

  • Based on 64-bit floating point arithmetic


Linpack1
LINPACK

  • 100x100 – inner loop optimization

  • 1000x1000 – three-loop/whole program optimization

  • Scalable parallel program – Largest problem that can fit in memory

  • Template of Linpack code

    • Generate

    • Solve

    • Check

    • Time



Hpl algorithm
HPL Algorithm

  • 2-D block-cyclic data distribution

  • Right-looking LU

  • Panel factorization: various options

  • - Crout, left or right-looking recursive variants based on matrix multiply

  • - Number of sub-panels

  • - recursive stopping criteria

  • - pivot search and broadcast by binary-exchange


Hpl algorithm1
HPL algorithm

  • Panel broadcast:

    -

  • Update of trailing matrix:

    - look-ahead pipeline

  • Validity check

    - should be O(1)


Top500 www top500 org
Top500 (www.top500.org)

  • Top500 – 1993

  • Twice a year – June and November

  • Top500 gives Nmax, Rmax, N1/2, Rpeak



Nas parallel benchmarks npb
NAS Parallel Benchmarks - NPB

  • Also for evaluation of Supercomputers

  • A set of 8 programs from CFD

  • 5 kernels, 3 pseudo applications

  • NPB 1 – Original benchmarks

  • NPB 2 – NAS’s MPI implementation. NPB 2.4 Class D has more work and more I/O

  • NPB 3 – based on OpenMP, HPF, Java

  • GridNPB3 – for computational grids

  • NPB 3 multi-zone – for hybrid parallelism


Npb 1 0 march 1994
NPB 1.0 (March 1994)

  • Defines class A and class B versions

  • “Paper and pencil” algorithmic specifications

  • Generic benchmarks as compared to MPI-based LinPack

  • General rules for implementations – Fortran90 or C, 64-bit arithmetic etc.

  • Sample implementations provided


Kernel benchmarks
Kernel Benchmarks

  • EP – embarrassingly parallel

  • MG – multigrid. Regular communication

  • CG – conjugate gradient. Irregular long distance communication

  • FT – a 3-D PDE using FFT. Rigorous test of long distance communication

  • IS – large integer sort

  • Detailed rules regarding

    - brief statement of the problem

    - algorithm to be practiced

    - validation of results

    - where to insert timing calls

    - method for generating random numbers

    - submission of results


Pseudo applications synthetic cfds
Pseudo applications / Synthetic CFDs

  • Benchmark 1 – perform few iterations of the approximate factorization algorithm (SP)

  • Benchmark 2 - perform few iterations of diagonal form of the approximate factorization algorithm (BT)

  • Benchmark 3 - perform few iterations of SSOR (LU)


Class a and class b
Class A and Class B

Class A

Sample Code

Class B


Npb 2 0 1995
NPB 2.0 (1995)

  • MPI and Fortran 77 implementations

  • 2 parallel kernels (MG, FT) and 3 simulated applications (LU, SP, BT)

  • Class C – bigger size

  • Benchmark rules – 0%, 5%, >5% change in source code


Npb 2 2 1996 2 4 2002 2 4 i o jan 2003
NPB 2.2 (1996), 2.4 (2002), 2.4 I/O (Jan 2003)

  • EP and IS added

  • FT rewritten

  • NPB 2.4 – class D and rationale for class D sizes

  • 2.4 I/O – a new benchmark problem based on BT (BTIO) to test the output capabilities

  • A MPI implementation of the same (MPI-IO) – different options using collective buffering or not etc.



ad