1 / 11

The HPC Challenge (HPCC) Benchmark Suite

The HPC Challenge (HPCC) Benchmark Suite. Piotr Luszczek. http://icl.cs.utk.edu/hpcc/. HPCC Components. Ax=b. ------------------------------------------------------- name kernel bytes/iter FLOPS/iter -------------------------------------------------------

ronli
Download Presentation

The HPC Challenge (HPCC) Benchmark Suite

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The HPC Challenge (HPCC)Benchmark Suite Piotr Luszczek http://icl.cs.utk.edu/hpcc/

  2. HPCC Components Ax=b ------------------------------------------------------- name kernel bytes/iter FLOPS/iter ------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2 ------------------------------------------------------- • HPL (High Performance LINPACK) • STREAM • PTRANS A ← AT+B • RandomAccess • FFT • Matrix-matrix multiply • b_eff (effective bandwidth/latency) 64 bits T[k] (+) ai T: +1 zk=xjexp(-2-1 jk/n) -1 C ← s*C + t * A*B ping pong

  3. HPC Challenge Benchmarks Select Applications HPCC: Motivation and Measurement FFT High DGEMM HPL Generated by PMaC @ SDSC Temporal Locality Mission Partner Applications PTRANS RandomAccess STREAM Low Spatial Locality High Concept Measurement Spatial and temporal data locality here is for one node/processor - i.e., locally or “in the small”.

  4. M M M M P P P P P P P P HPL STREAM FFT ... RandomAccess(1m) HPL(25%) system CPU thread G EP S Network Single M M M M P P P P P P P P OpenMP Vectorize Network CPU core(s) Softwaremodules Embarissingly Parallel Computationalresources M M M M P P P P P P P P MPI Memory Interconnect Network Global HPCC: Scope and Naming Conventions

  5. HPCC: Hardware Probes HPC Challenge Benchmark Corresponding Memory Hierarchy HPCS Performance Targets (improvement) • Top500: solves a system Ax = b • STREAM: vector operations A = B + s x C • FFT: 1D Fast Fourier Transform Z = FFT(X) • RandomAccess: random updates T(i) = XOR( T(i), r ) Registers 2 Petaflops (8x) Instr. Operands 6.5 Petabyte/s (40x) Cache Blocks 0.5 Petaflops (200x) Local Memory bandwidth latency Messages 64,000 GUPS (2000x) Remote Memory Pages Disk • HPCS program has developed a new suite of benchmarks (HPC Challenge) • Each benchmark focuses on a different part of the memory hierarchy • HPCS program performance targets will flatten the memory hierarchy, improve real application performance, and make programming easier

  6. HPCC: Official Submission Process • Download • Install • Run • Upload results • Confirm via @email@ • Tune • Run • Upload results • Confirm via @email@ Prequesites: • C compiler • BLAS • MPI Provide detailed installation and execution environment • Only some routines can be replaced • Data layout needs to be preserved • Multiple languages can be used Results are immediately available on the web site: • Interactive HTML • XML • MS Excel • Kiviat charts (radar plots) Optional

  7. HPCC Submissions over Time Sum Sum HPL [Tflop/s] FFT [Gflop/s] #1 #1 STREAM [GB/s] Sum Sum RandomAccess [GUPS] #1 #1

  8. HPCC: Comparing 3 Interconnects Kiviat chart (radar plot) • 3 AMD Opteron clusters • Clock: 2.2 GHz • 64-processor cluster • Interconnect types • Vendor • Commodity • GigE • G-HPL • Matrix-matrix multiply • Cannot be differentiated based on: • G-HPL • Matrix-matrix multiply • Available on HPCC website • http://icl.cs.utk.edu/hpcc/

  9. HPCS ~102 HPC ~104 Clusters ~106 HPCC: Sample Results’ Analysis Peta • All results in words/second • Highlights memory hierarchy • Clusters • Hierarchy steepens • HPC systems • Hierarchy constant • HPCS Goals • Hierarchy flattens • Easier to program Tera Effective Bandwidth (words/second) Giga Mega Kilo Systems (in Top500 order)

  10. TOP500 rating Data provided by HPCC database HPCC Augmenting June’07 TOP500

  11. Contacts Piotr Luszczek http://icl.cs.utk.edu/hpcc/

More Related