hpc and the roms benchmark program n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
HPC and the ROMS BENCHMARK Program PowerPoint Presentation
Download Presentation
HPC and the ROMS BENCHMARK Program

Loading in 2 Seconds...

play fullscreen
1 / 19

HPC and the ROMS BENCHMARK Program - PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on

HPC and the ROMS BENCHMARK Program. Kate Hedstrom August 2003. Outline. New ARSC systems Experience with ROMS benchmark problem Other computer news. New ARSC Systems. Cray X1 128 MSP (1.5 TFLOPS) 4 GB/MSP Water cooled IBM p690+ and p655+ 5 TFLOPS total At least 2 GB/cpu Air cooled

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'HPC and the ROMS BENCHMARK Program' - mercury


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hpc and the roms benchmark program

HPC and the ROMS BENCHMARK Program

Kate Hedstrom

August 2003

outline
Outline
  • New ARSC systems
  • Experience with ROMS benchmark problem
  • Other computer news
new arsc systems
New ARSC Systems
  • Cray X1
    • 128 MSP (1.5 TFLOPS)
    • 4 GB/MSP
    • Water cooled
  • IBM p690+ and p655+
    • 5 TFLOPS total
    • At least 2 GB/cpu
    • Air cooled
    • Arriving in September, switch later
slide6
Cray
  • Cray X1 Node
    • Node is a 4-way SMP
    • 16 GB/node
    • Each MSP has four vector/scalar processors
    • Processors in MSP share cache
    • Node usable as 4 MSPs or 16 SSPs
    • IEEE floating point hardware
slide7
Cray
  • Programming Environment
    • Fortran, C, C++
    • Support for
      • MPI
      • SHMEM
      • Co-Array Fortran
      • UPC
      • OpenMP (Fall 2003)
  • Compiling executes on CPES - Sun V480, happens invisibly to user
slide9
IBM
  • Two p690+
    • Like our Regatta, but faster, more memory (8 GB/cpu)
    • Shared memory between 32 cpu
    • For big OpenMP jobs
  • Six p655+ towers
    • Like our SP, but faster, more memory (2 GB/cpu)
    • Shared memory on each 8 cpu node, 92 nodes in all
    • For big MPI jobs and small OpenMP jobs
benchmark problem
Benchmark Problem
  • No external files to read
  • Three different resolutions
  • Periodic channel representing the Antarctic Circumpolar Current (ACC)
  • Steep bathymetry
  • Idealized winds, clouds, etc., but full computation of atmospheric boundary layer
  • KPP vertical mixing
ibm and sx6 notes
IBM and SX6 Notes
  • SX6 is 8 GFLOPS, Power4 is 5.2 GFLOPS peak
  • Both less than 10% of peak
  • IBM scales better, Cray person says SX6 is even worse for more than one node
  • SX6 best for 1xN tiling, IBM better closer to MxM even though this problem is 512x64
cray x1 notes
Cray X1 Notes
  • Have choice of MSP or SSP mode
    • Four SSPs faster than one MSP
    • Sixteen MSPs much faster than 64 SSPs
  • On one MSP, vanilla ROMS spends:
    • 66% in bulk_flux
    • 28% in LMD
    • 2% in 2-D engine
  • Slower than either Power4 or SX6
  • Can inline lmd_wscale and vastly speed up LMD with compiler option, John Levesque has offered to rewrite bulk_flux - aim for 6-8 times faster than Power4 for CCSM
clusters
Clusters
  • Can buy rack mounted turnkey systems running Linux
  • Need to spend money on:
    • Memory
    • Processors - single cpu nodes may be best
    • Switch - low latency, high bandwidth
    • Disk storage
don morton s experience
Don Morton’s Experience
  • No such thing as turnkey Beowulf
  • Need someone to take care of it:
    • Configure queuing system to make it useful for more than one user
    • Security updates
    • Backups
darpa petaflops award
DARPA Petaflops award
  • Sun, IBM, Cray each awarded ~$50 million for phase-two development
  • Two will be awarded phase 3 in 2006
  • Goal is to achieve petaflops by about 2010, also easier to program, more robust operating environment
    • Sun - new switch between cpus, memory
    • IBM - huge cache on chip
    • Cray - heavyweight, lightweight cpus
conclusions
Conclusions
  • Things are still exciting in the computer industry
  • The only thing you can count on is change