application scalability and high productivity computing
Download
Skip this Video
Download Presentation
Application Scalability and High Productivity Computing

Loading in 2 Seconds...

play fullscreen
1 / 21

Application Scalability and High Productivity Computing - PowerPoint PPT Presentation


  • 145 Views
  • Uploaded on

Application Scalability and High Productivity Computing. Nicholas J Wright John Shalf Harvey Wasserman Advanced Technologies Group NERSC/LBNL. NERSC- National Energy Research Scientific Computing Center.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Application Scalability and High Productivity Computing' - geri


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
application scalability and high productivity computing

Application Scalability and High Productivity Computing

Nicholas J Wright

John Shalf

Harvey Wasserman

Advanced Technologies Group

NERSC/LBNL

nersc national energy research scientific computing center
NERSC- National Energy Research Scientific Computing Center
  • Mission: Accelerate the pace of scientific discovery by providing high performance computing, information, data, and communications services for all DOE Office of Science (SC) research.
  • The production computing facility for DOE SC.
  • Berkeley Lab Computing Sciences Directorate
    • Computational Research Division (CRD), ESnet
    • NERSC
nersc is the primary computing center for doe office of science
NERSC is the Primary Computing Center for DOE Office of Science
  • NERSC serves a large population

Over 3000 users, 400 projects, 500 codes

  • NERSC Serves DOE SC Mission
    • Allocated by DOE program managers
    • Not limited to largest scale jobs
    • Not open to non-DOE applications
  • Strategy: Science First
    • Requirements workshops by office
    • Procurements based on science codes
    • Partnerships with vendors to meet

science requirements

nersc systems for science
NERSC Systems for Science
  • Large-Scale Computing Systems
  • Franklin (NERSC-5): Cray XT4
    • 9,532 compute nodes; 38,128 cores
    • ~25 Tflop/s on applications; 356 Tflop/s peak
    • Hopper (NERSC-6): Cray XE6
  • Phase 1: Cray XT5, 668 nodes, 5344 cores
  • Phase 2: 1.25 Pflop/s peak (late 2010 delivery)
  • Clusters
  • 140 Tflops total
  • Carver
    • IBM iDataplex cluster
  • PDSF (HEP/NP)
    • ~1K core throughput cluster
    • Magellan Cloud testbed
    • IBM iDataplexcluster
    • GenePool (JGI)
    • ~5K core throughput cluster
  • NERSC Global
  • Filesystem (NGF)
  • Uses IBM’s GPFS
  • 1.5 PB capacity
  • 5.5 GB/s of bandwidth

Analytics

Euclid

(512 GB shared memory)

Dirac GPU testbed (48 nodes)

  • HPSS Archival Storage
  • 40 PB capacity
  • 4 Tape libraries
  • 150 TB disk cache
nersc roadmap
NERSC-9

1 EF Peak

NERSC-8

100 PF Peak

NERSC-7

10 PF Peak

Hopper (N6)

>1 PF Peak

Peak Teraflop/s

Franklin (N5) +QC

36 TF Sustained

352 TF Peak

Franklin (N5)

19 TF Sustained

101 TF Peak

NERSC Roadmap

How do we ensure that Users Performance follows this trend and their Productivity is unaffected ?

Top500

Users expect 10x improvement in capability every 3-4 years

hardware trends the multicore era
Hardware Trends: The Multicore era
  • Moore’s Law continues unabated
  • Power constraints means cores will double every 18 months not clock speed
  • Memory capacity is not doubling at the same rate –GB/core will decrease

Power is the Leading Design Constraint

Figure courtesy of Kunle Olukotun, Lance Hammond, Herb Sutter, and Burton Smith

and the power costs will still be staggering
… and the power costs will still be staggering

From Peter Kogge, DARPA Exascale Study

$1M per megawatt per year! (with CHEAP power)

changing notion of system balance
Changing Notion of “System Balance”
  • If you pay 5% more to double the FPUs and get 10% improvement, it’s a win (despite lowering your % of peak performance)
  • If you pay 2x more on memory BW (power or cost) and get 35% more performance, then it’s a net loss (even though % peak looks better)
  • Real example: we can give up ALL of the flops to improve memory bandwidth by 20% on the 2018 system
  • We have a fixed budget
    • Sustained to peak FLOP rate is wrong metric if FLOPs are cheap
    • Balance involves balancing your checkbook & balancing your power budget
    • Requires a application co-design make the right trade-offs
summary technology trends
Summary: Technology Trends:
  • Number Cores 
    • Flops will be “free”
  • Memory Capacity per core 
  • Memory Bandwidth per core 
  • Network Bandwidth per core 
  • I/O Bandwidth 
navigating technology phase transitions
NERSC-9

1 EF Peak

NERSC-8

100 PF Peak

NERSC-7

10 PF Peak

Hopper (N6)

>1 PF Peak

Peak Teraflop/s

Franklin (N5) +QC

36 TF Sustained

352 TF Peak

Franklin (N5)

19 TF Sustained

101 TF Peak

Navigating Technology Phase Transitions

Exascale + ???

GPU CUDA/OpenCL

Or Manycore BG/Q, R

Top500

COTS/MPP + MPI (+ OpenMP)

COTS/MPP + MPI

application scalability

Application Scalability

How can a user continue to be productive in the face of these disruptive technology trends?

source of workload information
Source of Workload Information
  • Documents
    • 2005 DOE Greenbook
    • 2006-2010 NERSC Plan
    • LCF Studies and Reports
    • Workshop Reports
    • 2008 NERSC assessment
  • Allocations analysis
  • User discussion
new model for collecting requirements
New Model for Collecting Requirements
  • Joint DOE Program Office / NERSC Workshops
  • Modeled after ESnet method
    • Two workshops per year
    • Describe science-based needs over 3-5 years
  • Case study narratives
    • First workshop is BER, May 7, 8
application trends
Application Trends

Performance

  • Weak Scaling
    • Time to solution is often a non-linear function of problem size
  • Strong Scaling
    • Latency or Serial fraction will get you in the end.
  • Add features to models – “New” Weak Scaling

“Processors”

Performance

“Processors”

develop best practices in multicore programming
Develop Best Practices in Multicore Programming

NERSC/Cray Programming Models “Center of Excellence” combines:

  • LBNL strength in languages, tuning, performance analysis
  • Cray strength in languages, compilers, benchmarking

Goals:

  • Immediate goal is training material for Hopper users: hybrid OpenMP/MPI
  • Long term input into exascale programming model

= OpenMP thread parallelism

develop best practices in multicore programming17
Develop Best Practices in Multicore Programming

Conclusions so far:

  • Mixed OpenMP/MPI saves significant memory
  • Running time impact varies with application
  • 1 MPI process per socket is often good

Run on Hopper next:

  • 12 vs 6 cores per socket
  • Gemini vs. Seastar

= OpenMP thread parallelism

co design

Co-Design

Eating our own dogfood

inserting scientific apps into the hardware development process
Inserting Scientific Apps into the Hardware Development Process
  • Research Accelerator for Multi-Processors (RAMP)
    • Simulate hardware before it is built!
    • Break slow feedback loop for system designs
    • Enables tightly coupled hardware/software/science
    • co-design (not possible using conventional approach)
summary
Summary
  • Disruptive technology changes are coming
  • By exploring
    • new programming models (and revisiting old ones)
    • Hardware software co-design
  • We hope to ensure that scientists productivity remains high !
ad