130 likes | 142 Views
Correcting Error in Message Passing Systems. By Jan B ækgaard Pedersen. A Tool in Millipede Interactive Parallel Debugger. Overview. M ult i L eve l I nteractive P arall e l De bugging Millipede The problem – Deadlocks Deadlock detection module The idea The Algorithm
E N D
Correcting Error inMessage Passing Systems By Jan Bækgaard Pedersen A Tool in Millipede Interactive Parallel Debugger
Overview • MultiLevelInteractive Parallel Debugging • Millipede • The problem – Deadlocks • Deadlock detection module • The idea • The Algorithm • Theoretical justification
Message debugging Debugging straight line code Protocol debugging Visualization Multi Level Interactive Parallel Debugging Parallel Debugging Use a tool that is tailored to the specific debugging task • Sequential tool to debug sequential code. • Other tools to debug • Message passing errors • Message contents • Protocol errors • Protocol verification • Deadlock correction
Communication Visualization Module Graphical view of the message passing / protocol. Detect and analyze deadlocks And report the cause and fix Deadlock Detection & Correction Module Comm. Protocol Verification Module Online verification of the comm. protocol while running Message Debugging Module Inspect, control and change Contents of messages Sequential Debugging Module Debugging of the sequential code of the parallel program Millipede
The Problem • Message passing programs can deadlock:
The Idea S= {s0, s1,…,sn-1} list of deadlocked* senders. R= {r0, r1,…,rn-1} list of deadlocked* receivers. si= (a,b), a and b are process identifiers, with a fixed by the sender. ri= (a,b), a and b are process identifiers, with b fixed by the receiver. si = (ai,bi) matches rj = (aj,bj) if (ai = aj) and (bi = bj)
The Idea • Find permutations S of S and R of R: • Number of mismatches is minimal, i.e. • Minimal number of fields must be changed for the deadlock to disappear. • Report needed changes to user. Compute Hamming distances between all possible permutations of S and R, and pick the ones with the minimal distance.
The Algorithm Let G = (V,E) be a directed graph • V=VsVr (Vs senders, Vr receivers) • E is constructed in the following way • For all messages m left in message queues: • If m=(s,r) is an outstanding send (sVs, rVr) add edge (s,r) to E with capacity 2. • If m=(r,s) is an outstanding receive (sVs, rVr) add edge (r,s) to E with capacity 2. • Iterate backwards through all delivered messages and add edge (u,v) and (v,u) to E with capacity 2 if no other node exist in E with u or v as source or destination. • Add edges with capacity 1 to E to make G complete. Run maximum bipartite graph matching to get a matching.
Theoretical Justification • How accurate is this algorithm? • How is accuracy defined? • Given a working system without deadlocks. • Introduce a number of errors. • Run the algorithm. • If the algorithm often suggests the original working program as a fix, the accuracy is good.
3 moves 4 moves 7 moves
Examples 3 moves