A look at application performance sensitivity to the bandwidth and latency of infiniband networks
Download
1 / 14

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks. Darren J. Kerbyson Performance and Architecture Laboratory ( PAL ) http://www.c3.lanl.gov/pal Computer and Computational Sciences Division Los Alamos National Laboratory.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks' - eshe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A look at application performance sensitivity to the bandwidth and latency of infiniband networks
A Look at Application Performance Sensitivity to the Bandwidth and Latencyof Infiniband Networks

Darren J. Kerbyson

Performance and Architecture Laboratory (PAL)

http://www.c3.lanl.gov/pal

Computer and Computational Sciences Division

Los Alamos National Laboratory


Performance and architecture lab
Performance and Architecture Lab

  • Performance analysis team at Los Alamos

    • Measurement

    • Modeling

    • Simulation

  • Large-scale:

    • systems (10,000s to 100,000s processors)

    • applications

  • Analyze existing systems (or near-to-market systems)

  • Examine possible future systems

    • e.g. IBM PERCS (DARPA HPCS), next generation Blue Gene, …

  • Recent work includes:

    • Modeling and optimization of ASCI Q (SC03 best paper)

    • Comparison of systems: e.g. Earth Simulator, & Top 5 (CCPE05)

    • Blue Gene/L (SC04)

    • Large-scale Optical Circuit Switch network (SC05)


Assessing impact of network performance
Assessing impact of network performance

  • Context

    • What would be the performance improvement we had:

      • a network with higher bandwidth ?

      • a network with lower latency ?

    • Is it worth procuring an enhanced configuration ?

  • Approach

    • Use application performance models

      • Application abstraction encapsulating performance related features

        • Compute factors: single processor / node performance

        • Parallel factors: boundary exchanges, collectives etc.

      • Parameterized in terms of system characteristics

        • Node characteristics

        • Network characteristics (inc. bandwidth and latency)


Applications
Applications

  • Three applications of interest to Los Alamos:

    • Sweep3D: kernel application representing the heart of a deterministic SN transport calculation

    • SAGE: AMR hydrocode for shock propagation

    • Partisn: Deterministic SN transport code

  • Performance models previously developed

    • Validated on large-scale systems including:

      • Blue Gene/L (Lawrence Livermore) up to 32K nodes

      • Red Storm (Sandia) up to 8K processors

      • ASCI Q (Los Alamos) up to 8K processors

    • Typical ~10% error

  • Once validated can be used to explore performance on new systems



Network characteristics
Network Characteristics

  • Latency of 4µs seems optimistic (currently)

  • Latency of 1.5µs is close to PathScale (1.29µs)

  • Achievable bandwidth assumed is ~80% of peak

  • Infiniband fabric assumed to be a 12-ary fat-tree with switch latency of 200ns.


Performance studies
Performance studies

  • Sensitivity to network bandwidth and latency

    • 4x, 8x, & 12x bandwidths

    • 4µs, & 1.5µs near-neighbor latency

  • Effect of node size

    • Varying the number of processors in a node

      • Assumes single-core but applicable to multi-core

    • Assumes node: 2GHz AMD Opterons

    • Use of measured single processor performance

  • Vary system size

    • From 1 processor up to 8,192 processors

    • Concentrate on 256, 512 and 1024 processor clusters


Communication cost example
Communication cost - example

Sweep3D

SAGE

  • 4x, 8x, 12x IB with near-neighbor latency of 4µs

  • 4-way nodes


Performance sensitivity partisn
Performance sensitivity: Partisn

  • Relative to a baseline configuration:

    • 4-way, 4x IB with 4µs latency

  • X-axis indicates node-size sensitivity (1 to 8 way)

  • Bar height indicates bandwidth sensitivity

    • 4x = lowest bar value

    • 12x = highest bar value

    • 8x = white ‘mid’ line

  • Difference in solid and shaded bars indicates latency sensitivity (4µs & 1.5µs)


Performance sensitivity partisn1
Performance sensitivity: Partisn

  • 512 processor cluster

  • Highest sensitivity to node-size

    • Multiple processors sharing NIC

  • More sensitive to bandwidth than latency


Performance sensitivity sweep3d
Performance sensitivity: Sweep3D

  • 512 processor cluster

  • Highest sensitivity to latency

    • Most messages are small (~1KB)

  • Similar sensitivity to bandwidth and to node-size


Performance sensitivity sage
Performance sensitivity: SAGE

  • 512 processor cluster

  • Similar sensitivity to bandwidth and node size (1 to 4-way)

    • No change from 4 to 8-way due to application effect

  • Little sensitivity to latency


Sensitivity summary
Sensitivity summary

  • Says nothing about cost, or relative workload usage


Conclusions
Conclusions

  • Performance improvements due to enhanced network is application dependent

    • Bandwidth on SAGE

    • Latency on Sweep

    • Mixture (Node-size and bandwidth) on Partisn

  • Compute performance dampens any performance enhancement of network

    • Faster processors would increase performance sensitivity to network

  • Performance modeling can be used to assess configurations prior to procurement