Presentation outline
Download
1 / 26

presentation outline - PowerPoint PPT Presentation


  • 229 Views
  • Updated On :

Presentation Outline. A word or two about our program Our HPC system acquisition process Program benchmark suite Evolution of benchmark-based performance metrics Where do we go from here?. HPC Modernization Program. HPC Modernization Program Goals. DoD HPC Modernization Program.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'presentation outline' - paul2


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Presentation outline l.jpg
Presentation Outline

A word or two about our program

Our HPC system acquisition process

Program benchmark suite

Evolution of benchmark-based performance metrics

Where do we go from here?





Hpcmp serves a large diverse dod user community l.jpg
HPCMP Serves a Large, Diverse DoD User Community

Computational Fluid Dynamics – 1,572 Users

Computational Electromagnetics & Acoustics – 337 Users

  • 519 projects and 4,086 users at approximately 130 sites

  • Requirements categorized in 10 Computational Technology Areas (CTA)

  • FY08 non-real-time requirements of 1,108 Habu-equivalents

Electronics, Networking, and Systems/C4I – 114 Users

Computational Structural Mechanics – 437 Users

Environmental Quality Modeling & Simulation – 147 Users

Forces Modeling & Simulation – 182 Users

Computational Chemistry, Biology & Materials Science – 408 Users

Climate/Weather/Ocean Modeling & Simulation – 241 Users

Signal/Image Processing – 353 Users

Integrated Modeling & Test Environments – 139 Users

156 users are self characterized as “Other”


High performance computing centers l.jpg
High Performance Computing Centers

Strategic Consolidation

of Resources

4 Major Shared

Resource Centers

4 Allocated Distributed Centers


Hpcmp center resources l.jpg
HPCMP Center Resources

Legend

MSRCs

ADCs (DCs)

1993

2007

Total HPCMP End-of-Year Computational Capabilities

Note: Computational capability reflects available GFLOPS during fiscal year


Hpc modernization program msrcs l.jpg
HPC Modernization Program (MSRCs)

FY03

FY04

FY05

FY06

FY07

As of: August 2007


Hpc modernization program adcs l.jpg
HPC Modernization Program (ADCs)

FY03

FY04

FY05

FY06

As of: August 2007


Overview of ti xx acquisition process l.jpg
Overview of TI-XX Acquisition Process

Usability/past performance information on offered systems

Determination of Requirements, Usage, and Allocations

Choose application benchmarks, test cases, and weights

Vendors provide measured and projected times on offered systems

Measure benchmark times on DoD standard system

Determine performance for each offered system on each application test case

Determine performance for each offered system

Collective Acquisition Decision

Measure benchmark times on existing DoD systems

Determine performance for each existing system on each application test case

Use optimizer to determine price/performance for each offered system and combination of systems

Center facility requirements

Life-cycle costs for offered systems

Vendor pricing


Ti 08 synthetic test suite l.jpg
TI-08 Synthetic Test Suite

  • CPUBench – Floating point execution rate

  • ICBench – Interconnect bandwidth and latency

  • LANBench – External network interface and connection bandwidth

  • MEMBench – Memory bandwidth (MultiMAPS)

  • OSBench – Operating system noise (PSNAP from LANL)

  • SPIOBench – Streaming parallel I/O bandwidth


Ti 08 application benchmark codes l.jpg

AMR – Gas dynamics code

(C++/FORTRAN, MPI, 40,000 SLOC)

AVUS (Cobalt-60) – Turbulent flow CFD code

(Fortran, MPI, 19,000 SLOC)

CTH – Shock physics code

(~43% Fortran/~57% C, MPI, 436,000 SLOC)

GAMESS – Quantum chemistry code

(Fortran, MPI, 330,000 SLOC)

HYCOM – Ocean circulation modeling code

(Fortran, MPI, 31,000 SLOC)

ICEPIC – Particle-in-cell magnetohydrodynamics code

(C, MPI, 60,000 SLOC)

LAMMPS – Molecular dynamics code

(C++, MPI, 45,400 SLOC)

OOCore – Out-of-core solver mimicking electromagnetics code

(Fortran, MPI, 39,000 SLOC)

Overflow2 – CFD code originally developed by NASA

(Fortran, MPI, 83,600 SLOC)

WRF – Multi-Agency mesoscale atmospheric modeling code

(Fortran and C, MPI, 100,000 SLOC)

TI-08 Application Benchmark Codes



Determination of performance l.jpg
Determination of Performance

  • Establish a DoD standard benchmark time for each application benchmark case

    • ERDC Cray dual-core XT3 (Sapphire)chosen as standard DoD system

    • Standard benchmark times on DoD standard system measured at 128 processors for standard test cases and 512 processor for large test cases

    • Split in weight between standard and large application test cases will be made at 256 processors

  • Benchmark timings (at least four on each test case) are requested for systems that meet or beat the DoD standard benchmark times by at least a factor of two (preferably four)

  • Benchmark timings may be extrapolated provided they are guaranteed, but at least two actual timings must be provided for each test case


Determination of performance cont l.jpg
Determination of Performance (cont.)

Curve fit: Time = A/N + B + C*N

N = number of processing cores

A/N = time for parallel portion of code (|| base)

B = time for serial portion of code

C*N = parallel penalty (|| overhead)

Constraints

A/N ≥ 0 Parallel base time is non-negative.

Tmin≥ B ≥ 0 Serial time is non-negative and is not greater than the minimum observed time.


Determination of performance cont16 l.jpg
Determination of Performance (cont.)

Curve fit approach

For each value of B (Tmin≥ B ≥ 0)

Determine A: Time – B = A/N

Determine C: Time – (A/N + B) = C*N

Calculate fit quality

(Ni, Ti) = time Ti observed at Ni cores

M = number of observed core counts

Select the value of B with largest fit quality


Determination of performance cont17 l.jpg
Determination of Performance (cont.)

Calculate score (in DoD standard system equivalents)

C = number of compute cores in target system

Cbase = number of compute cores in standard system

Sbase = number of compute cores in standard execution

STM = size-to-match = number of compute cores of target system required to match performance of Sbase cores of the standard system










What s next l.jpg
What’s Next?

  • Continue to evolve application benchmarks to represent accurately the HPCMP computational workload

  • Increase profiling and performance modeling to understand application performance better

  • Use performance predictions to supplement application benchmark measurements and guide vendors in designing more efficient systems


ad