26th IEEE International Parallel & Distributed Processing Symposium
Download
1 / 22

Yanhua Sun , Gengbin Zheng , Laximant(Sanjay ) Kale Parallel Programming Lab - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

26th IEEE International Parallel & Distributed Processing Symposium. A uGNI -Based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect. Yanhua Sun , Gengbin Zheng , Laximant(Sanjay ) Kale Parallel Programming Lab

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Yanhua Sun , Gengbin Zheng , Laximant(Sanjay ) Kale Parallel Programming Lab' - inga-summers


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

26th IEEE International Parallel & Distributed Processing Symposium

A uGNI-Based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect

Yanhua Sun, GengbinZheng, Laximant(Sanjay) Kale

Parallel Programming Lab

University of Illinois at Urbana-Champaign

Ryan Olson, Cray Inc

Terry R. Jones, Oak Ridge National Lab


Motivation
Motivation Symposium

  • Modern interconnects are complex

  • Multiple programming models/languages are developed


Motivation1
Motivation Symposium

  • Modern interconnects are complex

  • Multiple programming models/languages are developed

    How to attain good performance for applications in alternative models on different interconnects ?


Motivation2
Motivation Symposium

  • Modern interconnects are complex

  • Multiple programming models/languages are developed

    How to attain good performance for applications in alternative models on different interconnects ?

    Charm++ programming model on Gemini Interconnect


Outline
Outline Symposium

Overview of Charm++, Gemini and uGNI

Design of uGNI-based Charm++

Optimizations to improve communication

Micro-benchmark and application results


Charm software architecture
Charm++ Software Architecture Symposium

  • Charm++ is an

    object-based over decomposition programming model

  • Adaptive intelligent runtime

    • dynamic load balancing

    • fault tolerance

  • Scales to 300K cores

  • Portable

  • Run on MPI


Gemini interconnect
Gemini Interconnect Symposium

  • Low latency (700ns)

  • High bandwidth (8GBytes/sec)

  • Scale to 100,000 nodes


Gemini interconnect1
Gemini Interconnect Symposium

  • Low latency (700ns)

  • High bandwidth (8GBytes/sec)

  • Scale to 100,000 nodes

  • Hardware support for one-sided communication

  • Fast Memory Access (FMA)

  • Block Transfer Engine (BTE)


uGNI Symposium

  • User-level Generic Network Interface

    • Memory Registration/de-

    • Post FMA/BTE transactions

    • Completion Queues


Design of ugni based charm
Design of SymposiumuGNI-based Charm++

  • Small messages (less than 1024 bytes)

  • SMSG directly send with data_tag


Baseline pingpong performance
Baseline SymposiumPingpong Performance


Persistent messages
Persistent Messages Symposium

  • Communication with fixed pattern

    • Communication processors

    • Data size

  • Re-use memory

    • Avoid memory allocation

    • Avoid the first handshake message


Persistent messages1
Persistent Messages Symposium

Baseline design to transfer data

Transfer persistent messages



Memory pool
Memory Pool Symposium

Memory registration/de-registration costs a lot

Charm++ controls all memory allocation/de-allocation


Memory pool1
Memory Pool Symposium

Memory registration/de-registration costs a lot

Charm++ controls all memory allocation/de-allocation

Pre-alloc/register big chucks of memory

Allocation/de- is from memory pool





Nqueens fine grained
NQueens Symposium (fine-grained)


Namd 100m atom on titan
NAMD 100M-atom on Titan Symposium

17%

32%

70% efficiency


Conclusion
Conclusion Symposium

  • Gemini Interconnect, Charm++

  • Optimizations

    • Persistent messages

    • Memory pool

  • Micro-benchmark and application results

    http://charm.cs.uiuc.edu/software


ad