a look at application performance sensitivity to the bandwidth and latency of infiniband networks
Download
Skip this Video
Download Presentation
A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks

Loading in 2 Seconds...

play fullscreen
1 / 14

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks. Darren J. Kerbyson Performance and Architecture Laboratory ( PAL ) http://www.c3.lanl.gov/pal Computer and Computational Sciences Division Los Alamos National Laboratory.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks' - eshe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a look at application performance sensitivity to the bandwidth and latency of infiniband networks
A Look at Application Performance Sensitivity to the Bandwidth and Latencyof Infiniband Networks

Darren J. Kerbyson

Performance and Architecture Laboratory (PAL)

http://www.c3.lanl.gov/pal

Computer and Computational Sciences Division

Los Alamos National Laboratory

performance and architecture lab
Performance and Architecture Lab
  • Performance analysis team at Los Alamos
    • Measurement
    • Modeling
    • Simulation
  • Large-scale:
    • systems (10,000s to 100,000s processors)
    • applications
  • Analyze existing systems (or near-to-market systems)
  • Examine possible future systems
    • e.g. IBM PERCS (DARPA HPCS), next generation Blue Gene, …
  • Recent work includes:
    • Modeling and optimization of ASCI Q (SC03 best paper)
    • Comparison of systems: e.g. Earth Simulator, & Top 5 (CCPE05)
    • Blue Gene/L (SC04)
    • Large-scale Optical Circuit Switch network (SC05)
assessing impact of network performance
Assessing impact of network performance
  • Context
    • What would be the performance improvement we had:
      • a network with higher bandwidth ?
      • a network with lower latency ?
    • Is it worth procuring an enhanced configuration ?
  • Approach
    • Use application performance models
      • Application abstraction encapsulating performance related features
        • Compute factors: single processor / node performance
        • Parallel factors: boundary exchanges, collectives etc.
      • Parameterized in terms of system characteristics
        • Node characteristics
        • Network characteristics (inc. bandwidth and latency)
applications
Applications
  • Three applications of interest to Los Alamos:
    • Sweep3D: kernel application representing the heart of a deterministic SN transport calculation
    • SAGE: AMR hydrocode for shock propagation
    • Partisn: Deterministic SN transport code
  • Performance models previously developed
    • Validated on large-scale systems including:
      • Blue Gene/L (Lawrence Livermore) up to 32K nodes
      • Red Storm (Sandia) up to 8K processors
      • ASCI Q (Los Alamos) up to 8K processors
    • Typical ~10% error
  • Once validated can be used to explore performance on new systems
network characteristics
Network Characteristics
  • Latency of 4µs seems optimistic (currently)
  • Latency of 1.5µs is close to PathScale (1.29µs)
  • Achievable bandwidth assumed is ~80% of peak
  • Infiniband fabric assumed to be a 12-ary fat-tree with switch latency of 200ns.
performance studies
Performance studies
  • Sensitivity to network bandwidth and latency
    • 4x, 8x, & 12x bandwidths
    • 4µs, & 1.5µs near-neighbor latency
  • Effect of node size
    • Varying the number of processors in a node
      • Assumes single-core but applicable to multi-core
    • Assumes node: 2GHz AMD Opterons
    • Use of measured single processor performance
  • Vary system size
    • From 1 processor up to 8,192 processors
    • Concentrate on 256, 512 and 1024 processor clusters
communication cost example
Communication cost - example

Sweep3D

SAGE

  • 4x, 8x, 12x IB with near-neighbor latency of 4µs
  • 4-way nodes
performance sensitivity partisn
Performance sensitivity: Partisn
  • Relative to a baseline configuration:
    • 4-way, 4x IB with 4µs latency
  • X-axis indicates node-size sensitivity (1 to 8 way)
  • Bar height indicates bandwidth sensitivity
    • 4x = lowest bar value
    • 12x = highest bar value
    • 8x = white ‘mid’ line
  • Difference in solid and shaded bars indicates latency sensitivity (4µs & 1.5µs)
performance sensitivity partisn1
Performance sensitivity: Partisn
  • 512 processor cluster
  • Highest sensitivity to node-size
    • Multiple processors sharing NIC
  • More sensitive to bandwidth than latency
performance sensitivity sweep3d
Performance sensitivity: Sweep3D
  • 512 processor cluster
  • Highest sensitivity to latency
    • Most messages are small (~1KB)
  • Similar sensitivity to bandwidth and to node-size
performance sensitivity sage
Performance sensitivity: SAGE
  • 512 processor cluster
  • Similar sensitivity to bandwidth and node size (1 to 4-way)
    • No change from 4 to 8-way due to application effect
  • Little sensitivity to latency
sensitivity summary
Sensitivity summary
  • Says nothing about cost, or relative workload usage
conclusions
Conclusions
  • Performance improvements due to enhanced network is application dependent
    • Bandwidth on SAGE
    • Latency on Sweep
    • Mixture (Node-size and bandwidth) on Partisn
  • Compute performance dampens any performance enhancement of network
    • Faster processors would increase performance sensitivity to network
  • Performance modeling can be used to assess configurations prior to procurement
ad