Synchronizing the timestamps of concurrent events in traces of hybrid MPI/ OpenMP applications

Synchronizing the timestamps of concurrent events in traces of hybrid MPI/OpenMP applications

Cluster systems • Cluster systems represent majority of today’s supercomputers • Availability of inexpensivecommodity components • Vast diversity • Architecture • Interconnect technology • Software environment • Message-passing and shared-memory programming models for communication and synchronization

Event tracing • Application areas • Performance analysis • Time-line visualization • Wait-state analysis • Performance modeling • Performance prediction • Debugging • Events recorded at runtime to enable post-mortem analysis of dynamic program behavior • Event includes at least timestamp, location, and event type Send Barrier Recv Barrier record E S X E MX MX E R X E write … … … … E E MX MX E E S R X X merge (opt.) … … MX MX X E E E S R X

Problem: Non-synchronized clocks

Outline

Clock synchronization • Query time from reference clocks synchronized at regular intervals • Mills Determine medial smoothing function based on send/receive differences Duda, Hofman, Hilgers Lamport, Mattern, Fidge, Rabenseifner Restore and preserve logical correctness Lamport, Mattern, Fidge, Rabenseifner Restore and preserve logical correctness Dunigan, Maillet, Tron, Doleschal Measure offset values and determine interpolation function

µmin Controlled logical clock X E S E X E R X

MPI semantics E E E E MX MX MX MX E E E MX MX MX E MX E E E MX MX MX E MX

Limitations of the CLC algorithm • Neither restores nor preserves clock condition in OpenMP event semantics • May introduce violations in locations that were previously intact S omp_barrier omp_barrier R R omp_barrier

Consider OpenMP constructs as composed of multiple logical messages Define logical send/receive pairs for each flavor Collective communication omp_barrier omp_barrier E OX E OX

OpenMP semantics • Tasking F E E OX J U OX E E OX U L OX E OX E OX U U U L

Happened-before relation • Operation may have multiple logical receive and send events • Multiple receives used to synchronize multiple clocks • Latest send event is the relevant send event OX E E OX E MX E OX

Replay communication Traverse trace in parallel Exchange data at synchronization points Use operation of same type MPI functions OpenMP constructs Parallelization • Correct local traces in parallel • Keep whole trace in memory • Exploit distributed memory & processing capabilities

Forward replay … … omp_barrier 1 1 1 … … omp_barrier 2 2 2 2 2 … … omp_barrier 3 3 3

Backward amortization • Avoid new violations • Do not advance send farther than matching receive S R S R

Backward replay • Data on sender side needed • Communication direction • Communication precedes in backward direction • Roles of sender and receiver are inverted • Traversal direction • Start at end of trace • Avoid deadlocks S R S R R S R S … … S R … … S R

Amortization interval Piece-wise correction min(LCk’(corr. receive event) - µ - LCib) R differences to LCib R R ∆t R R LCib R S S S S S

Significant percentage of messages was violated (up to 5%) After correction all traces were free of clock condition violations • Nicole cluster • JSC@FZJ • 32 compute nodes • 2 quad-core Opteron • running at 2.4 GHz • Infiniband • Applications • PEPC (4 threads per process) • Jacobi solver • (2 threads per process) Evaluation focused on frequency of clock violations, accuracy, and scalability of the correction Experimental evaluation

Event distance Larger relative deviations possible Impact on analysis results negligible Correction changed the length of local intervals only marginally Accuracy of the algorithm • Event position • Absolute deviations correspond tovalue clock condition violations • Relative deviations are negligible Correction only marginally changes the length of local intervals

Algorithm preserved OpenMP semantics Synchronizing hybrid codes • Only violated MPI semantics in original trace • Roughly half of the corrections correspond to OpenMP semantics S omp_barrier R R omp_barrier omp_barrier omp_barrier

Scalability

Summary

Outlook • Exploit knowledge of MPI-internal messaging inside collective operations using PERUSE • Leverage periodic offset measurements at global synchronization points

Thanks!

Synchronizing the timestamps of concurrent events in traces of hybrid MPI/ OpenMP applications

Synchronizing the timestamps of concurrent events in traces of hybrid MPI/ OpenMP applications

Presentation Transcript

HYBRID CAR TECHNOLOGY AND EMERGENCY PROCEDURES

Security Challenges in Hybrid Telephony

Parallel Programming in C with MPI and OpenMP

Concurrent Engineering

Introduction to Concurrency in Programming Languages: Appendices: A brief overview of OpenMP, Erlang, and Cilk

OpenMP

Web-Based Applications

Computer System Chapter 12. Concurrent Programming

Outline About LinkedIn Personal Profile Make Connections Communicate Groups Pages Events Answers

Algebra of Concurrent Programming

Concurrent Tries with Efficient Non-blocking Snapshots

Concurrent Audit procedures

Coloured Petri Nets Modelling and Validation of Concurrent Systems

Hybrid Soft Computing: Where Are We Going?

Writing and tuning OpenMP programs on distributed shared memory platforms

Parallel Computing with OpenMP on distributed shared memory platforms

Concurrent Programming in Java

Algorithmic Verification of Concurrent Programs

Concurrent Design and Engineering in Building and Civil Engineering