1 / 13

Correcting Error in Message Passing Systems

Correcting Error in Message Passing Systems. By Jan B ækgaard Pedersen. A Tool in Millipede Interactive Parallel Debugger. Overview. M ult i L eve l I nteractive P arall e l De bugging Millipede The problem – Deadlocks Deadlock detection module The idea The Algorithm

Download Presentation

Correcting Error in Message Passing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correcting Error inMessage Passing Systems By Jan Bækgaard Pedersen A Tool in Millipede Interactive Parallel Debugger

  2. Overview • MultiLevelInteractive Parallel Debugging • Millipede • The problem – Deadlocks • Deadlock detection module • The idea • The Algorithm • Theoretical justification

  3. Message debugging Debugging straight line code Protocol debugging Visualization Multi Level Interactive Parallel Debugging Parallel Debugging Use a tool that is tailored to the specific debugging task • Sequential tool to debug sequential code. • Other tools to debug • Message passing errors • Message contents • Protocol errors • Protocol verification • Deadlock correction

  4. Communication Visualization Module Graphical view of the message passing / protocol. Detect and analyze deadlocks And report the cause and fix Deadlock Detection & Correction Module Comm. Protocol Verification Module Online verification of the comm. protocol while running Message Debugging Module Inspect, control and change Contents of messages Sequential Debugging Module Debugging of the sequential code of the parallel program Millipede

  5. The Problem • Message passing programs can deadlock:

  6. The Idea S= {s0, s1,…,sn-1} list of deadlocked* senders. R= {r0, r1,…,rn-1} list of deadlocked* receivers. si= (a,b), a and b are process identifiers, with a fixed by the sender. ri= (a,b), a and b are process identifiers, with b fixed by the receiver. si = (ai,bi) matches rj = (aj,bj) if (ai = aj) and (bi = bj)

  7. The Idea • Find permutations S of S and R of R: • Number of mismatches is minimal, i.e. • Minimal number of fields must be changed for the deadlock to disappear. • Report needed changes to user. Compute Hamming distances between all possible permutations of S and R, and pick the ones with the minimal distance.

  8. The Algorithm Let G = (V,E) be a directed graph • V=VsVr (Vs senders, Vr receivers) • E is constructed in the following way • For all messages m left in message queues: • If m=(s,r) is an outstanding send (sVs, rVr) add edge (s,r) to E with capacity 2. • If m=(r,s) is an outstanding receive (sVs, rVr) add edge (r,s) to E with capacity 2. • Iterate backwards through all delivered messages and add edge (u,v) and (v,u) to E with capacity 2 if no other node exist in E with u or v as source or destination. • Add edges with capacity 1 to E to make G complete. Run maximum bipartite graph matching to get a matching.

  9. Theoretical Justification • How accurate is this algorithm? • How is accuracy defined? • Given a working system without deadlocks. • Introduce a number of errors. • Run the algorithm. • If the algorithm often suggests the original working program as a fix, the accuracy is good.

  10. 3 moves 4 moves 7 moves

  11. Example

  12. Example

  13. Examples 3 moves

More Related