1 / 45

Source Level Debugging of Parallel Programs

Source Level Debugging of Parallel Programs. Roland Wismüller LRR-TUM, TU München Germany. Outline. Introduction: source level debuggers Debuggers for parallel programs Current / future work at LRR-TUM. What is a Debugger?. A tool to remove bugs? No! A tool to find bugs? No!

nanji
Download Presentation

Source Level Debugging of Parallel Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany

  2. Outline • Introduction: source level debuggers • Debuggers for parallel programs • Current / future work at LRR-TUM

  3. What is a Debugger? • A tool to remove bugs? • No! • A tool to find bugs? • No! • A tool to examine program executions? • Yes!

  4. Source Level Debugging

  5. Compilation Example

  6. Setting a Breakpoint

  7. Setting a Breakpoint

  8. Printing a Variable

  9. call foo trap move r0,r1 Continue Execution 4) cont must execute original instruction replace trap with original instruction

  10. call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step

  11. Continue Execution 4) cont must execute original instruction call foo add #4,sp move r0,r1 insert trap again

  12. Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 continue execution

  13. call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • Problem: • there may be no support for single stepping

  14. call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step replace next instruction with a trap

  15. call foo add #4,sp trap Continue Execution 4) cont must execute original instruction continue execution

  16. call foo add #4,sp trap Continue Execution 4) cont must execute original instruction insert original trap & instruction

  17. call foo trap move r0,r1 Continue Execution 4) cont must execute original instruction continue execution • Still a problem: • original instruction may be a jump / call / ret • we have to emulate these instructions!

  18. call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • A different problem: • multithreading: • another thread may bypass our breakpoint

  19. call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • A different problem: • multithreading: • another thread may bypass our breakpoint

  20. call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • A different problem: • multithreading: • another thread may bypass our breakpoint

  21. call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • A different problem: • multithreading: • another thread may bypass our breakpoint

  22. Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 add #4,sp • Solution: • don’t remove the trap • execute original instruction somewhere else

  23. add #4,sp Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 • Solution: • don’t remove the trap • execute original instruction somewhere else

  24. add #4,sp Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 • Solution: • don’t remove the trap • execute original instruction somewhere else

  25. Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 add #4,sp • Solution: • don’t remove the trap • execute original instruction somewhere else

  26. add #4,sp Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 • Still a problem: • instruction may depend on the PC value • we have to emulate these instructions!

  27. Optimization Effects

  28. variable table i register i5 short print i reads i5 prints z !! Optimization Effects

  29. Parallel Debugging • Additional properties of parallel programs • Requirements for parallel debuggers • Problems and solution techniques

  30. Parallel Programs • Multiple processes and/or threads • created dynamically • many of them • Program distributed across several hosts • Additional state components: • communication subsystem

  31. Multiple Processes / Threads • Naming processes / threads • system id’s • may not be unique, not persistent • not user friendly • debugger generated id’s • usually: small integers • selection based on additional information • naming not yet existent processes / threads • DETOP: pattern matching

  32. Thread Selection in DETOP debugger id function executable system id node list selection pattern

  33. Scalability • Input: use process / thread sets • commands are executed for each member • e.g. [1,2,3] print i or [2,7] break 123 • sometimes: named sets • problems: • command semantics may differ for the processes e.g. different executables / call stacks • when to evaluate named sets?

  34. DETOP User Interface

  35. [1]: 12.3 [2]: 4.1 [3]: 12.3 [4]: 12.3 [5]: 12.3 [1,3-5]: 12.3 [2]: 4.1 Scalability • Output: aggregation • simple case: aggregate identical results • complex case: aggregate partially identical results • impossible cases: asynchronous events

  36. Aggregating Stacks: Call Tree

  37. Concurrency Issues • What happens if a thread stops? • stop all threads in all processes • stop all threads in the same process • stop only that thread • What happens if I continue a thread? • start all threads in all processes • start all threads in the same process • start only that thread • When does the debugger accept input? • only when all processes are stopped • always

  38. Concurrency Issues • What happens if a thread stops? • stop all threads in all processes (BP option) • stop all threads in the same process (BP option) • stop only that thread • What happens if I continue a thread? • start all threads in all processes (separate command) • start all threads in the same process (use pattern) • start only that thread • When does the debugger accept input? • only when all processes are stopped • always

  39. Additional State Components • E.g. message buffers, blocked processes • Usually no support from debuggers • additional dependency on programming library implementation • Often other tools (visualizers) will show this information • use them together with the debugger (?) interoperable tools

  40. Interoperable Tools • Multiple, loosely coupled tools are used on the same program • Concrete scenario: • debugger that allows to ‘time-warp’ • i.e. return to previous program states without rerunning the program • speed up debugging cycle of long running programs

  41. ‘Time-Warp’ Debugger • Tools that need to interoperate: • parallel debugger (DETOP) • checkpointing system for parallel programs (CoCheck, based on Condor) • deterministic execution controller (codex) • means to specify the state to return to (VISTOP: state based program flow visualizer)

  42. Preconditions for Interoperability • Common monitoring infrastructure • OMIS / OCM • Mechanisms for informing tools on modifications of state done by other tools • e.g. VISTOP must know when DETOP stops a process, as event buffer must be read • Mechanisms for direct tool interaction • e.g. VISTOP to CoCheck: ‘restart from checkpoint’

  43. OMIS • Basis: • objects + services • event / action paradigm • scalability by using object sets • location transparency • Example: thread_creates_proc([t_1,t_2]): thread_stop([$proc, $new_proc]) thread_get_backtrace([$thread],0)

  44. Interoperability Problems • A tool may violate preconditions of another tool • DETOP can stop a process • checkpointing is initiated by sending a signal • stopped process won’t handle signal ! • we cannot hide the state change from the checkpointer this case cannot be handled easily

  45. The End • Debuggers are by far not trivial • Parallel debuggers are even more complex • Lots of open (maybe unsolvable) research issues • Interoperability may ease implementation of enhanced functionality

More Related