A look at application performance sensitivity to the bandwidth and latency of infiniband networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks. Darren J. Kerbyson Performance and Architecture Laboratory ( PAL ) http://www.c3.lanl.gov/pal Computer and Computational Sciences Division Los Alamos National Laboratory.

Download Presentation

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A look at application performance sensitivity to the bandwidth and latency of infiniband networks

A Look at Application Performance Sensitivity to the Bandwidth and Latencyof Infiniband Networks

Darren J. Kerbyson

Performance and Architecture Laboratory (PAL)

http://www.c3.lanl.gov/pal

Computer and Computational Sciences Division

Los Alamos National Laboratory


Performance and architecture lab

Performance and Architecture Lab

  • Performance analysis team at Los Alamos

    • Measurement

    • Modeling

    • Simulation

  • Large-scale:

    • systems (10,000s to 100,000s processors)

    • applications

  • Analyze existing systems (or near-to-market systems)

  • Examine possible future systems

    • e.g. IBM PERCS (DARPA HPCS), next generation Blue Gene, …

  • Recent work includes:

    • Modeling and optimization of ASCI Q (SC03 best paper)

    • Comparison of systems: e.g. Earth Simulator, & Top 5 (CCPE05)

    • Blue Gene/L (SC04)

    • Large-scale Optical Circuit Switch network (SC05)


Assessing impact of network performance

Assessing impact of network performance

  • Context

    • What would be the performance improvement we had:

      • a network with higher bandwidth ?

      • a network with lower latency ?

    • Is it worth procuring an enhanced configuration ?

  • Approach

    • Use application performance models

      • Application abstraction encapsulating performance related features

        • Compute factors: single processor / node performance

        • Parallel factors: boundary exchanges, collectives etc.

      • Parameterized in terms of system characteristics

        • Node characteristics

        • Network characteristics (inc. bandwidth and latency)


Applications

Applications

  • Three applications of interest to Los Alamos:

    • Sweep3D: kernel application representing the heart of a deterministic SN transport calculation

    • SAGE: AMR hydrocode for shock propagation

    • Partisn: Deterministic SN transport code

  • Performance models previously developed

    • Validated on large-scale systems including:

      • Blue Gene/L (Lawrence Livermore) up to 32K nodes

      • Red Storm (Sandia) up to 8K processors

      • ASCI Q (Los Alamos) up to 8K processors

    • Typical ~10% error

  • Once validated can be used to explore performance on new systems


Application characteristics

Application characteristics


Network characteristics

Network Characteristics

  • Latency of 4µs seems optimistic (currently)

  • Latency of 1.5µs is close to PathScale (1.29µs)

  • Achievable bandwidth assumed is ~80% of peak

  • Infiniband fabric assumed to be a 12-ary fat-tree with switch latency of 200ns.


Performance studies

Performance studies

  • Sensitivity to network bandwidth and latency

    • 4x, 8x, & 12x bandwidths

    • 4µs, & 1.5µs near-neighbor latency

  • Effect of node size

    • Varying the number of processors in a node

      • Assumes single-core but applicable to multi-core

    • Assumes node: 2GHz AMD Opterons

    • Use of measured single processor performance

  • Vary system size

    • From 1 processor up to 8,192 processors

    • Concentrate on 256, 512 and 1024 processor clusters


Communication cost example

Communication cost - example

Sweep3D

SAGE

  • 4x, 8x, 12x IB with near-neighbor latency of 4µs

  • 4-way nodes


Performance sensitivity partisn

Performance sensitivity: Partisn

  • Relative to a baseline configuration:

    • 4-way, 4x IB with 4µs latency

  • X-axis indicates node-size sensitivity (1 to 8 way)

  • Bar height indicates bandwidth sensitivity

    • 4x = lowest bar value

    • 12x = highest bar value

    • 8x = white ‘mid’ line

  • Difference in solid and shaded bars indicates latency sensitivity (4µs & 1.5µs)


Performance sensitivity partisn1

Performance sensitivity: Partisn

  • 512 processor cluster

  • Highest sensitivity to node-size

    • Multiple processors sharing NIC

  • More sensitive to bandwidth than latency


Performance sensitivity sweep3d

Performance sensitivity: Sweep3D

  • 512 processor cluster

  • Highest sensitivity to latency

    • Most messages are small (~1KB)

  • Similar sensitivity to bandwidth and to node-size


Performance sensitivity sage

Performance sensitivity: SAGE

  • 512 processor cluster

  • Similar sensitivity to bandwidth and node size (1 to 4-way)

    • No change from 4 to 8-way due to application effect

  • Little sensitivity to latency


Sensitivity summary

Sensitivity summary

  • Says nothing about cost, or relative workload usage


Conclusions

Conclusions

  • Performance improvements due to enhanced network is application dependent

    • Bandwidth on SAGE

    • Latency on Sweep

    • Mixture (Node-size and bandwidth) on Partisn

  • Compute performance dampens any performance enhancement of network

    • Faster processors would increase performance sensitivity to network

  • Performance modeling can be used to assess configurations prior to procurement


  • Login