1 / 24

Synchronizing the timestamps of concurrent events in traces of hybrid MPI/ OpenMP applications

Synchronizing the timestamps of concurrent events in traces of hybrid MPI/ OpenMP applications. Cluster systems. Cluster systems represent majority of today’s supercomputers Availability of inexpensive commodity components Vast diversity Architecture Interconnect technology

kenny
Download Presentation

Synchronizing the timestamps of concurrent events in traces of hybrid MPI/ OpenMP applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synchronizing the timestamps of concurrent events in traces of hybrid MPI/OpenMP applications

  2. Cluster systems • Cluster systems represent majority of today’s supercomputers • Availability of inexpensivecommodity components • Vast diversity • Architecture • Interconnect technology • Software environment • Message-passing and shared-memory programming models for communication and synchronization

  3. Event tracing • Application areas • Performance analysis • Time-line visualization • Wait-state analysis • Performance modeling • Performance prediction • Debugging • Events recorded at runtime to enable post-mortem analysis of dynamic program behavior • Event includes at least timestamp, location, and event type Send Barrier Recv Barrier record E S X E MX MX E R X E write … … … … E E MX MX E E S R X X merge (opt.) … … MX MX X E E E S R X

  4. Problem: Non-synchronized clocks

  5. Outline

  6. Clock synchronization • Query time from reference clocks synchronized at regular intervals • Mills Determine medial smoothing function based on send/receive differences Duda, Hofman, Hilgers Lamport, Mattern, Fidge, Rabenseifner Restore and preserve logical correctness Lamport, Mattern, Fidge, Rabenseifner Restore and preserve logical correctness Dunigan, Maillet, Tron, Doleschal Measure offset values and determine interpolation function

  7. µmin Controlled logical clock X E S E X E R X

  8. MPI semantics E E E E MX MX MX MX E E E MX MX MX E MX E E E MX MX MX E MX

  9. Limitations of the CLC algorithm • Neither restores nor preserves clock condition in OpenMP event semantics • May introduce violations in locations that were previously intact S omp_barrier omp_barrier R R omp_barrier

  10. Consider OpenMP constructs as composed of multiple logical messages Define logical send/receive pairs for each flavor Collective communication omp_barrier omp_barrier E OX E OX

  11. OpenMP semantics • Tasking F E E OX J U OX E E OX U L OX E OX E OX U U U L

  12. Happened-before relation • Operation may have multiple logical receive and send events • Multiple receives used to synchronize multiple clocks • Latest send event is the relevant send event OX E E OX E MX E OX

  13. Replay communication Traverse trace in parallel Exchange data at synchronization points Use operation of same type MPI functions OpenMP constructs Parallelization • Correct local traces in parallel • Keep whole trace in memory • Exploit distributed memory & processing capabilities

  14. Forward replay … … omp_barrier 1 1 1 … … omp_barrier 2 2 2 2 2 … … omp_barrier 3 3 3

  15. Backward amortization • Avoid new violations • Do not advance send farther than matching receive S R S R

  16. Backward replay • Data on sender side needed • Communication direction • Communication precedes in backward direction • Roles of sender and receiver are inverted • Traversal direction • Start at end of trace • Avoid deadlocks S R S R R S R S … … S R … … S R

  17. Amortization interval Piece-wise correction min(LCk’(corr. receive event) - µ - LCib) R differences to LCib R R ∆t R R LCib R S S S S S

  18. Significant percentage of messages was violated (up to 5%) After correction all traces were free of clock condition violations • Nicole cluster • JSC@FZJ • 32 compute nodes • 2 quad-core Opteron • running at 2.4 GHz • Infiniband • Applications • PEPC (4 threads per process) • Jacobi solver • (2 threads per process) Evaluation focused on frequency of clock violations, accuracy, and scalability of the correction Experimental evaluation

  19. Event distance Larger relative deviations possible Impact on analysis results negligible Correction changed the length of local intervals only marginally Accuracy of the algorithm • Event position • Absolute deviations correspond tovalue clock condition violations • Relative deviations are negligible Correction only marginally changes the length of local intervals

  20. Algorithm preserved OpenMP semantics Synchronizing hybrid codes • Only violated MPI semantics in original trace • Roughly half of the corrections correspond to OpenMP semantics S omp_barrier R R omp_barrier omp_barrier omp_barrier

  21. Scalability

  22. Summary

  23. Outlook • Exploit knowledge of MPI-internal messaging inside collective operations using PERUSE • Leverage periodic offset measurements at global synchronization points

  24. Thanks!

More Related