1 / 16

NERSC

NERSC. Today’s mission: Accelerate scientific discovery at the DOE Office of Science through high performance computing and extreme data analysis. National Energy Research Scientific Computing Center Established 1974, first unclassified supercomputer center

emma
Download Presentation

NERSC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NERSC • Today’s mission: Accelerate scientific discovery at the DOE Office of Science through high performance computing and extreme data analysis • National Energy Research Scientific Computing Center • Established 1974, first unclassified supercomputer center • Original mission: to enable computational science as a complement to magnetically controlled plasma experiment

  2. NERSC: Production Computing for the DOE Office of Science • Diverse workload: • 4,500 users, 600 projects • 700 codes; 100s of users daily • Allocations controlled primarily by DOE • 80% DOE Annual Production awards (ERCAP): • From 10K hour to ~10M hour • Proposal-based; DOE chooses • 10% DOE ASCR Leadership Computing Challenge • 10% NERSC reserve (“NISE”)

  3. DOE View of Workload NERSC 2013 Allocations By DOE Office

  4. Science View of Workload NERSC 2013 Allocations By Science Area

  5. ASCR Facilities

  6. NERSC High End Computing and Storage Capabilities • Large-Scale Computing Systems • Hopper (NERSC-6): Early Cray Gemini System • 6,384 compute nodes, 153,216 cores • 144 Tflop/s on applications; 1.3 Pflop/s peak • Edison (NERSC-7): Early Cray Aries System (2013) • Over 200 Tflop/s on applications, 2 Pflop/s peak • 333 TB of memory, 6.4 PB of disk • Midrange • 275 Tflops peak • Carver • IBM iDataplexcluster • 10740 cores; 132TF • PDSF (HEP/NP) • ~2300 core cluster; 30TF • GenePool(JGI) • ~8200 core cluster; 113TF • 2.1 PB Isilon File System Analytics & Testbeds IBM x3850 1TB, 2TB nodes Dirac 50 NvidiaGPU nodes JesupIBM iDataPlex Data Analytics; HTC • NERSC Global • Filesystem (NGF) • Uses IBM’s GPFS • 8.5 PB capacity • 15GB/s of bandwidth • HPSS Archival Storage • 240 PB capacity • 5Tape libraries • 200 TB disk cache

  7. JGI Historical Usage Repo: m342 PI: Eddy Rubin Repo: m1045 PI: Victor Markowitz *Note: % charged may differ from usage/allocation due to refunds

  8. Why are we giving hours back? 2012 JGI’s Available Hours: 31,536,000 (GP1) + 4,905,600 (highmem) + 20,000,000 (NERSC) = 56,441,600 System Instability JGI clusters consolidated into Genepool; system went through period of instability (June-Sept 2012), users relied more heavily on other systems and checkpointing Some analysis jobs couldn’t run on Genepool Hadoop-style jobs, MPP work (RaxML, Ray,) needed to be run on Carver/Hopper 2013 JGI’s Available Hours: 31,536,000 (GP1) + 30,835,200 (GP2) + 6,937,920 (highmem) + 20,000,000 (NERSC) = 89,309,120 System Stability Improvements Major improvements to the scheduler, file systems and user workload has improved Genepool stability/availability; fewer jobs being rerun System Expansion / Configuration Genepool doubled in compute power end of 2012; nodes configured for both MPP and traditional JGI workloads (e.g. Hadoop jobs now run here)

  9. Not everything can run on Hopper/Edison/Carver This analysis is critical to JGI’s mission and CAN NOT currently run on Hopper or Carver This analysis CAN and has been done on Hopper or Carver • Phylogenetic Tree Reconstruction • USEARCH • HMMER • BLAST (inefficient) • Metagenome Assembly Research (MPI-based assemblers like Ray) • Hadoop • Illumina pipeline • SMRTPortal • Fungal annotation pipeline • RQC pipeline • Jigsaw • Large memory assemblies • IMG production runs Single large analysis runs, codes that can run at scale, jobs that tolerate long queue waits High-throughput, automated, slot-scheduled jobs/pipelines; require large memory nodes, local disk, external database access Goal: Determine which workflows could migrate to Hopper/Edison (e.g. Jigsaw) Goal: Improve efficiency (particularly I/O) in existing workflows

  10. Genepool is sufficient today, but what about next year? Goal: Accurately predict that givenXcompute nodes, we can analyzeYsamples over the course of Zmonths. • Step 1: Collect data • On Genepool (jobs run by program, type of analysis, time to complete, queue wait time) - procmon • On file systems (amount of data created per job, access patterns of the job) –NGF scripts • Step 2: Analyze the data • Predict compute time needed per sample sequenced; define acceptable queue wait and turn around times - MATH • Predict space needed per project - MATH • Step 3: Sanity check predictions • Add columns to LIMS system giving PMs the ability to enter predictions for compute and disk space needs IN PROGRESS 2013-14 2014

  11. High-Impact Results on Day One NERSC’s users started running production codes immediately on Edison. 408 M MPP hours delivered in 2013 through Oct. 16. Top projects: carbon sequestration, artificial photosynthesis, complex novel materials, cosmic background radiation analysis Edison is very similar to Hopper, but with 2-5 times the performance per core on most codes. NERSC 8 Benchmark Performance

  12. Edison is the premier production computing platform for DOE Office of Science 5200 compute nodes 124.5K processing cores 333 Terabytes memory 2.4 petaflops peak 530 TB/s memory bandwidth 11TB/s global bandwidth 1.3MW per PF 6.4PB storage @ 140TB/s • New Cray XC30 with Intel Ivy Bridge processors and Aries interconnect • Designed to support HPC and data-intensive work • Performs 2-4 x Hopper per node on real applications • Outstanding scalability for massively parallel apps • Easy adoption for users – runs current apps unmodified • Ambient cooled for extreme energy efficiency

  13. * STREAM

  14. Supported Programming Languages, Models and Compilers * STREAM

  15. How to compile Hopper and Edison are Cray super computers and have specialized compilers/compiler wrappers that are optimized for these systems (demo) * STREAM

  16. How to compile Use modules to find available software (same as Genepool) For Cray and GNU programming environments all Cray scientific and math libraries are available (compile as you would on Genepool) For Intel programming environment, some libraries are different (contact consult@nersc.gov if you have trouble with your builds) * STREAM

More Related