1 / 12

Sun HPC10000

Sun HPC10000. Richard Frost San Diego Supercomputer Center. Hardware Overview. Architecture: 64 GB Shared Memory (UMA) 64 Ultra Sparc II processors 400 MHz, 800 Mflops I/O: 6.4 GB/s peak per SBus, 100MB/s/SBus sustained Network SUN Ultra Port Architecture (UPA)

hector
Download Presentation

Sun HPC10000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sun HPC10000 Richard Frost San Diego Supercomputer Center

  2. Hardware Overview • Architecture: • 64 GB Shared Memory (UMA) • 64 Ultra Sparc II processors • 400 MHz, 800 Mflops • I/O: 6.4 GB/s peak per SBus, 100MB/s/SBus sustained • Network • SUN Ultra Port Architecture (UPA) • Gigaplane-XB : 16 X 16 data cross bar links every system board • 4 separate address buses • Bandwidth = 200 MB/sec per PE ~ 12.8 GB/sec total • Read latency of ~400 ns • Peak speed: 51.2 Gflops

  3. Software Overview • Queuing system • LSF = Load Sharing Facility • For MPI jobs • pam is part of ‘hpc’ queue but not part of others • Choose # of PEs in LSF script • Put pam in front of executable for non hpc queues • For threaded (OpenMP or Phtreads) jobs • Choose # of PEs in LSF script • Set number of threads in LSF script • Start executable without pam

  4. Software Overview • Compilers and Programming Tools • Serial: f77, f90, cc, CC • MPI: tmf90, tmf77, tmcc, tmCC (must use -lmpi flag) • Sun Fortran OpenMP: tmf90 (must use -mp=openmp -xparallel) • KAI C and Fortran OpenMP: guidef90, guidec • Debuggers and Performance Monitors from Sun and KAI • Libraries • Sun Performance Library (serial) and SSL (parallel) numerical libraries • Applications software and Tools • Abaqus, Gaussian98, Nastran, SPRNG, … • Prism, TotalView debuggers • See http://www.npaci.edu/Applications/

  5. Programming Models • Data Parallel • OpenMP can be very effective • MPI for portability; esp. when some data sets exceed Sun’s 60+ GB • Task Parallel • Either OpenMP or MPI • Choice is dependent upon application needs; including problem size • Task Farming • MPI is best choice • Event-driven Dense Communication • MPI is best choice • Multi-Level Parallelism • Sun MPI + Sun F90 OpenMP recommended • Sun MPI + KAI OpenMP (guide) will work in simple situations

  6. Recommended Porting Methods • Port MPI code ‘as is’ • Modify makefile for compiler options (tmXX … -lmpi) • Submit jobs using LSF script • Put pam in front of executable for non hpc queues • Port OpenMP code ‘as is’ • Modify makefile for compiler options (-mp=openmp -xparallel) • Set number of threads in LSF script • Submit job without pam • Use Prism or TotalView for debugging and tuning

  7. Performance Considerations • CPUs • Most applications will be CPU bound • Large memory jobs (> 1GB) must use 64-bit addressing • Network/Interconnect • Good bandwidth with moderate to large messages • Moderately high latency for intermittent small messages • Messaging “warm-up” can improve data locality -- i.e., 1st iteration might be slowest • I/O • Sun was 1st vendor to implement complete MPI I/O • Tuned to architecture

  8. Performance Results/Comparisons • Performance comparison: HPC10000 and T90 • CMRR Monte Carlo code for thermal instability : wall clock time on 40 HPC10K is 2.5 hrs; on 1 T90 18.05 hrs (for this case 1 T90 procs equivalent to ~ 5.5 HPC10K procs) • Scaling for this case :#Procs Time (hrs) Speedup-----------------------------------------------------------20 5.5 1.0 30 3.8 1.4540 2.8 1.96

  9. Performance Results/Comparisons

  10. Performance Results/Comparisons

  11. User Experiences • Half the machine allocated to NPACI users on 1/1/00 • Universities : SUNY Buffalo, Auburn U, Stanford U, Masschusetts General Hospital, Penn State, Cornell U, UCSD, UNM, ... • Topics : parallel adaptive FEM for bone structures, plasma process in magnetosheath, LES, light scattering of neuron cells, study of mesoscale microstructures, SMP performance, etc. • Other projects, including SAC projects: • Bertram : Magnetic recording studies, CMRR, Physics, UCSD • Bower : Neuron modeling, Neuroscience, Cal Tech • Hauschildt : Stellar atmosphere, Astronomy, U Georgia • Bourne : Protein Data Base, SDSC/UCSD

  12. Future Developments… • 64 CPU ultra upgrade to Solaris 8 • Same (identical) environment as gaos • ultra becomes “compute only” engine • ultra logins restricted to necessary sysadmin tasks • user logins restricted to gaos • Sun to release OpenMP for C

More Related