AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks

Download Presentation

AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks

Loading in 2 Seconds...

- 83 Views
- Uploaded on
- Presentation posted in: General

AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

40th IEEE/IFIP International Conference on Dependable Systems and Networks

AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks

Ignacio Laguna,

SaurabhBagchi

Greg Bronevetsky,

Bronis R. de Supinski, Dong H. Ahn,

Martin Schulz

Jun 30th, 2010

- Large systems will have millions of cores in near future
- Increased difficulty for developing correct HPC applications
- Traditional debuggers don’t perform well at this scale

- Faults come from various sources
- Hardware: soft errors, physical degradation, design bugs
- Software: coding bugs, misconfigurations

Sequoia (2011)

1.6 million cores

- Need for tools to help developers find root cause quickly

Questions a developer has to answer when an application fails:

Line of code?

Code region?

Parallel task that failed?

AutomaDeD

When did it fail?

root cause

Phase Annotation

Offline

Application

…

Task1

Task2

Taskn

Online

PNMPI Profiler

…

Model1

Model2

Modeln

Clustering

Offline

(1) Abnormal Phases

(2) Abnormal Tasks

(3) Characteristic Transitions

Between runs

Run 1

Run 2

Run 3

MPI Application

MPI Application

MPI Application

Tasks

Tasks

Tasks

…

…

…

1

2

3

n

1

2

3

n

1

2

3

n

Spatial

(between tasks)

Temporal

(between time points)

time

time

time

- Like a Markov model but with time between transitions
- Nodes: application states
- Edges: transitions from one state to another

- Time spent in current state (before transition)

- Transition
- probability

0.2 , 5μs

B

0.7, 15μs

A

C

0.1, 500μs

D

- States correspond to:
- Calls to MPI routines
- Code between MPI routines

Application Code

Semi-Markov Model

main()Init

main() {

MPI_Init()

… Computation …

MPI_Send(…, 1, MPI_INTEGER, …);

for(…)

foo();

MPI_Recv(…, 1, MPI_INTEGER, …);

MPI_Finalize();

}

foo() {

MPI_Send(…, 1024, MPI_DOUBLE, …);

…Computation…

MPI_Recv(…, 1024, MPI_DOUBLE, …);

…Computation…

}

Computation

main()Send-INT

main()foo()Send-DBL

Different statefor different calling context

Computation

main()foo()Recv-DBL

Computation

main()Recv-INT

main()Finalize

DataSamples

Time Values

Gaussian Distribution

(Parametric model)

Histograms

(Non-parametric model)

Bucket Counts

Density

Function

Line Connectors

Gaussian Tail

Time Values

Time Values

- More Expensive
- Greater Accuracy

- Cheaper
- Lower Accuracy

Phase Annotation

Offline

Application

…

Task1

Task2

Taskn

Online

PNMPI Profiler

…

Model1

Model2

Modeln

Clustering

Offline

(1) Abnormal Phases

(2) Abnormal Tasks

(3) Characteristic Transitions

- Phases denote regions of execution repeated dynamically
- Developers annotate phases in the code
- MPI_Pcontrol is intercepted by wrapper library

Sample Code:

main() {

MPI_Init()

… Computation …

MPI_Send(…, MPI_INTEGER, …);

for(…) {

MPI_Send(…, MPI_DOUBLE, …);

…Computation…

MPI_Recv(…, MPI_DOUBLE, …);

MPI_Pcontrol();

}

…Computation…

MPI_Recv(…, MPI_INTEGER, …);

MPI_Finalize();

}

Phase 1, 2, 3,….

SMM

SMM

SMM

SMM

Task 1

time

Task 2

time

SMM

SMM

SMM

SMM

Task n

. . .

. . .

. . .

. . .

time

SMM

SMM

SMM

SMM

Phase 2

Phase 1

Phase 3

Phase 4

Phase Annotation

Offline

Application

…

Task1

Task2

Taskn

Online

PNMPI Profiler

…

Model1

Model2

Modeln

Clustering

Offline

(1) Abnormal Phases

(2) Abnormal Tasks

(3) Characteristic Transitions

- Goal:find phase that differs the most from other phases

Sample runs available:

Sample Runs

SMM1

SMM1

SMM1

SMM1

SMM1

SMM1

…

…

SMM2

SMM2

SMM2

SMM2

SMM2

SMM2

Compare to counterpart

SMMn

SMMn

SMMn

SMMn

SMMn

SMMn

Phase 1

Phase 2

Phase M

Phase 1

Phase 2

Phase M

Without sample runs:

Deviation

score

SMM1

SMM1

SMM1

…

SMM2

SMM2

SMM2

Compare each phase to all others

SMMn

SMMn

SMMn

Phase 1

Phase 2

Phase M

Diss(SMM1, SMM2) = L2 Norm (Transition prob.) + L2 Norm (Time prob.)

Each task starts in its own cluster

Step 1

Task 1 SMM

Task 2 SMM

Task 3 SMM

Task 4 SMM

Task 1 SMM

Task 2 SMM

Task 3 SMM

Task 4 SMM

Step 2

Task 1 SMM

Task 2 SMM

Task 3 SMM

Task 4 SMM

Step 3

Do we stop? or, Do we get one cluster?

?

Step 4

We need a dissimilarity threshold to decide when to stop

- User provides application’s natural cluster count k
- Use sample runs to compute clustering threshold τ that produces k clusters
- Use sample runs if available
- Otherwise, compute τ from start of execution
- Threshold based on highest increase in dissimilarity

- During real runs, cluster tasks using threshold τ

Cluster Isolation: to separate buggy task in unusual cluster

Master-Worker Application Example

Buggy Execution

Normal Execution

Cluster 2

Cluster 1

Cluster 1

Task 9

Task 8

Task 7

Task 3

Task 5

Task 3

Task 1

Task 6

Task 2

Task 6

Task 5

Task 7

Task 8

Task 9

Task 1

Task 2

Task 4

Task 4

Cluster 3

Cluster 2

Bug in Task 9

- Method 1:
- Find edge that distinguishes faulty cluster from the others
- Recall: SMM dissimilarity is based in part on L2 norm of SMM’s parameters

- Method 2:
- Find unusual individual edge
- Edge that takes unusual amount of time (compared to observed times)

Visualization of Results

Isolated transition (cluster 2)

- NAS Parallel Benchmarks (MPI programs):
- BT, CG, FT, MG, LU and SP
- 16 tasks, Class A (input)

- 2000 injection experiments per application:

- ~90% for loops and message drops
- ~60% for extra threads
- Training = sample runs available
- Training significantly better than no training
- Histograms better than Gaussians

Training vs.No Training

Some Faults vs. NoFault Samples

Gaussian vs. Histogram

Faults

- Results assume phase detected accurately
- Accuracy of Cluster Isolation highly variable

- Poor detection elsewhere because of fault propagation: buggy task normal task(s)

- Accuracy up to 90% for extra threads

Application

Faults

- Erroneous transition lies in top 5 candidates (identified by AutomaDeD)
- Accuracy ~90% for loop faults
- Highly variable for others
- Less variable if event order information is used

- Job execution script failed to clean up at job end
- MPI tasks executer (mpirun, version 0.9.9)
- Left runaway processes on nodes

- Simulation:
- Execute BT (affected application)
- Run concurrently runaway applications (LU, MG or SP)
- Runaway tasks interfere with normal BT execution

Affected application: BT benchmark

Interfering applications: SP, LU, MG benchmarks

Abnormal phase detected in phase 1 in SP and LU, and in phase 2 in MG

Constant (average) SMM difference in regular BT runs

- Contributions:
- Novel way to model and compare parallel tasks’ behavior
- Focus debugging efforts on time period, tasks and code region where bug is first manifested
- Accuracy up to ~90% for phase detection, cluster and transition isolation (delays and hangs)

- Ongoing work:
- Scaling implementation to work on millions of tasks
- Improving accuracy through different statistical models (e.g., Kernel Density Estimation, Gaussian Mixture Models)