TDB. TD B: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS. Authors:. RCMS PSI RAS , Pereslavl-Zalessky , Russia. A. Adamovich M. Kovalenko. History of the Development. T-system RCMS PSI RAS , since the early 90 s
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS
RCMS PSI RAS,
RCMS PSI RAS, since the early 90s
T-system and itsenvironment:
The TDB architecture:
1) The primary daemon
2) The secondary daemon
3) The central server
4) The client component
5) The debugging server
User can select the exact set of computational nodes that are available for debugging MPI tasks.
The list of all nodes available for MPI task debugging can be obtained through the request to TDB daemons.
The primary TDB daemon is running on front-end and Secondary TDB daemons are running on computational nodes of cluster. TDB daemons represent monitor processes.
Secondary daemons collect and the primary daemon accumulates useful info about computational nodes status.
Is used to configure various TDB, GTDB,
and MPI implementations settings
Upper bar : common MPI-node status
Green - all processes of the node are running
Yellow – at least one of the processes is stopped
Red - at least one process caught a signal
Common status bar is used in purpose to give the user the opportunity to read information about the situation with debugging processes in a more simple and clear way.All status subcomponents are implemented as button widgets:
if clicked, open appropriate process (processes) for individual exploration in the PROCS GTDB mode.
The component is used to work with various types of breakpoints supported in TDB:
all of them may have conditions.
As well a special type of breakpoints is implemented in TDB, so called “group breakpoints”. The group breakpoint allows user to set a number of uniform breakpoints in a group of parallel processes. The user can set, delete, disable or enable group breakpoint in one command or click.
GTDB in the MAIN -> PROC mode. Process 2:0 is an active (selected, exploring) process...
Example of dynamic groups definition using the "dgroup" command
We continue the execution of processes from the masters dynamic group and then stop on previously set breakpoints in the loop.
As we can see the ‘i’ variable equals to zero on all processes in the masters group (the "print" command on group masters was used). To get out from the loop we set the ‘i’ variable on all masters to 1.
We continue execution of masters group processes, but – after the loop – execution is stopped by the SIGSEGV signal.
In the Main mode the user can work with one selected (active) process or group
In the Procs mode he/she can examine any process individually.
The component was implemented as two “notebooks” inserted one into the other.
The first (outer, placed vertically) notebook is the MPI-nodes notebook. Its bookmarks contain info about appropriate processes and common MPI-node statuses, colored as nodes status component.
The second (inner, placed horizontally) notebook is a notebook of processes...