1 / 21

Program Analysis and Tuning

Program Analysis and Tuning. The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis. Climate Simulation. We use a computer model of the climate system

Download Presentation

Program Analysis and Tuning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research PanagiotisAdamidis

  2. Climate Simulation We use a computermodel of the climate system • a computer program, which simulates an abstract model (mathematical representation) of the climate system • reproducing the relevant features based on • theoretical principles (e.g. laws of nature) • observed relationships

  3. „Blizzard“ – IBM Power6 System • Peak performance: 158 TeraFlop/s (158 trillion floating point operations per second) • 264 IBM Power6 nodes • 16 dual core CPUs per node (altogether 8,448 compute cores) • more than 20 TeraByte memory • 7,000 TeraByte of disk space until 2011 • Infiniband network: 7.6 TeraByte/s (aggregated) High performance computing system „Blizzard“ at DKRZ - compute nodes (orange), infiniband switch (red), disks (green)

  4. Hybrid World Node OpenMP OpenMP OpenMP Message Passing

  5. Parallel Compiler • Why can’t I just say f90 –Parallel mycode.f and everything works fine ? • Logical dependencies • Data dependencies

  6. Multiprocessor – Shared Memory CPU CPU CPU CPU Network Memory Module Memory Module Memory Module Memory Module

  7. Concepts - Shared Memory Directives Master Thread Single Process Team of Threads Parallel Region Master Thread Single Process Team of Threads Parallel Region Master Thread Single Process

  8. Amdahls law

  9. Hybrid World Node OpenMP OpenMP OpenMP Message Passing

  10. Processes und Threads OpenMP OpenMP Message Passing

  11. „Blizzard“ – IBM Power6 System • Peak performance: 158 TeraFlop/s (158 trillion floating point operations per second) • 264 IBM Power6 nodes • 16 dual core CPUs per node (altogether 8,448 compute cores) • more than 20 TeraByte memory • 7,000 TeraByte of disk space until 2011 • Infiniband network: 7.6 TeraByte/s (aggregated) High performance computing system „Blizzard“ at DKRZ - compute nodes (orange), infiniband switch (red), disks (green)

  12. Bottlenecks • Bottlenecks of Massively Parallel Computing Systems • Memory Bandwidth • Communication Network • Idle Processors

  13. Memory Hierarchy

  14. Data Movement

  15. Data Movement in Parallel Systems

  16. Hybrid World Node OpenMP OpenMP OpenMP Message Passing

  17. Memory Module Memory Module CPU CPU The World of MPI CPU CPU Network

  18. Processes und Threads OpenMP OpenMP Message Passing

  19. Motivation • Improvetheefficiencyof a parallel programrunning on High Performance Computers • Typical Workflow • Development of a parallel Program • Measurement and Runtimeanalysis of the Code • Optimizing the Code

  20. Performance Engineering • Profiling • Summarize performance data per process/thread during execution • „statistical“ Analysis • Tracing • Trace record with performance data and timestamp per process/thread • e.g. MPI messages

  21. Optimization • Compilers cannot optimize automatically everything • Optimization is not just finding the right compiler flag • Major algorithmic changes are necessary

More Related