Upc check a scalable tool for detecting run time errors in unified parallel c
This presentation is the property of its rightful owner.
Sponsored Links
1 / 43

UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C PowerPoint PPT Presentation


  • 37 Views
  • Uploaded on
  • Presentation posted in: General

UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C. Indranil Roy High Performance Computing (HPC) group. Segmentation error. Core dumped. A good error message.

Download Presentation

UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Upc check a scalable tool for detecting run time errors in unified parallel c

UPC CHECK: A scalable tool for detecting run-time errors in Unified Parallel C

Indranil Roy

High Performance Computing (HPC) group


Upc check a scalable tool for detecting run time errors in unified parallel c

Segmentation error. Core dumped.


A good error message

A good error message

Thread 0 encountered invalid arguments in function upc all broadcast at line 26 in file /home/jjc/ex1.upc.

Error: Parameter (sizeof(int ) * shval) passes non-positive value of 0 to nbytes argument

Variable shval was declared at line 10 in file /home/jjc/ex1.upc.


Outline

Outline

  • Understanding a Unified Parallel C

  • UPC-CHECK 1.0 tool

    • How does it work?

    • Usability

    • Error coverage and quality of error reports generated

    • Testing

    • Overheads

    • Scalability

    • Known limitations

  • Challenges in argument error detection

  • Deadlock detection algorithm

  • Demo


Understanding unified parallel c

Understanding Unified Parallel C

  • Distributed memory model

  • Shared memory model


Understanding unified parallel c1

Understanding Unified Parallel C

  • Unified Parallel C

    • Distributed Shared Memory Model or Partitioned Global Address Space Model


Upc check v1 0

UPC-CHECK v1.0

  • Source to source translator

  • Pre-compiler

  • Error handling

    • Argument

      errors

    • Deadlocks


Upc check usability

UPC-CHECK: Usability

  • Portable

    • Machine independent

    • Compiler independent

  • Ease of use

    • Easy to install

      install_UPC-CHECK

    • Easy to run

  • Freely available

    wget http://hpcgroup.public.iastate.edu/UPC-CHECK/UPC-CHECK.tar.gz


Upc check 1 0 usability

UPC-CHECK 1.0: Usability

  • Usage

    upc-check [compiler options] [--upccheck:flag [--upccheck:flag] ...] -c sourcefile.upc

    -a|-d_argument_checkdisables argument checking (enabled by

    default)

    -d|-d_deadlock_checkdisables deadlock checking (enabled by

    default)

    -s|-e_track_func_call_stackenables tracing of function call stack

    (disabled by default)

    -h|--h|-help prints help for UPC-CHECK

  • Just replace your compile-command with upc-check.


Quality of error reports generated

Quality of error reports generated

  • Coyle, J., Hoekstra, J., Kraeva, M., Luecke, G. R., Kleiman, R., Srinivas, V., Tripathi, A., Weiss, O., Wehe, A., Xu, Y., Yahya, M. (2008). UPC Run-Time Error Detection Test Suite. http://kraeva.public.iastate.edu/rted/UPC.TestPlan.pdf,

    Iowa State University, High Performance Computing Group.

    • A score of 5 is given for a detailed error message that will assist a programmer to x the error.

    • A score of 4 is given for error messages with more information than a score of 3 and less than 5. This is tailored for each test.

    • A score of 3 is given for error messages with the correct error name, line number and the name of the file where the error occurred.

    • A score of 2 is given for error messages with the correct error name and line number where the error occurred but not the file name where the error occurred.

    • A score of 1 is given for error messages with the correct error name.

    • A score of 0 is given when the error was not detected.


Upc check 1 0 testing

UPC-CHECK 1.0: Testing

  • 400 error test-cases

  • 1800 false-positive cases

  • Additional testing for deadlocks

  • Testing across application programs


Upc check 1 0 overhead

UPC-CHECK 1.0: Overhead

  • Base memory requirement

    • ~ 128 KB per thread

    • With every acquired or requested shared memory lock, requirement goes by around 256 B

    • while tracking function call stack, with every level of nested function call, memory requirement goes by around 512 B

  • Increase of code section

    • ~ 100 lines of instrumentation per UPC operation

    • ~12000 lines from support files


Efficiency overhead

Efficiency overhead


Upc check 1 0 scalability

UPC-CHECK 1.0: Scalability

  • CROW cluster

  • Cray compiler

  • Cray run-time environment

  • 128 threads


Upc check v1 0 known limitations

UPC-CHECK v1.0: Known limitations

  • UPC-CHECK will not test the single-valued requirement of upcforall statements.

  • Since UPC-CHECK works on UPC source programs, it will be unable to handle any deadlocks which are created in a library that a user might be using.

  • UPC-CHECK should not be used for programs where the ‘main' function lies within a header

    file

    • Best effort will be made, but may lead to memory leaks at end of execution.


Challenges in checking argument errors

Challenges in checking argument errors

  • Engineering challenges

    • Exhaustiveness

    • Argument checks against multiple functions

    • Handling vector arguments

    • Dependency of one argument on another argument

    • Data-structures used

    • Displaying the errors


A novel deadlock detection algorithm

A novel Deadlock Detection Algorithm

  • Dynamic

  • Optimal

    • O(1) for deadlocks created by collective routines

    • O(n) for deadlocks created by locks

  • Distributed

  • Scalable


A few more terms collective operations

A few more terms:“collective” operations

  • “Collective” is a constraint placed on some language operations which requires evaluation of such operations to be matched across all threads. The behavior of collective operations is undefined unless all threads execute the same sequence of collective operations.

  • “Single valued” refers to an operand to a collective operation, which has the same value on every thread. The behavior of the operation is otherwise undefined.


Central idea

Central idea

  • The collective requirement simply states a relative ordering property of calls to collective operations that must be maintained in the parallel execution trace for all executions of any legal program.


Upc check a scalable tool for detecting run time errors in unified parallel c

threads

time


Deadlocks in upc

Deadlocks in UPC

1. Not all threads are waiting at the same collective routine

time

threads

0

1

2

i

j

T-2

T-1


Upc check a scalable tool for detecting run time errors in unified parallel c

2. Some threads are waiting at the same collective routine when at least

one of the threads has reached end-of-execution

threads

0

1

2

i

j

T-2

T-1

time

time

End-of-execution

One of the threads at a collective routine is holding a lock that at least

one of the threads are trying to acquire.

threads

0

1

2

i

j

T-2

T-1


Upc check a scalable tool for detecting run time errors in unified parallel c

5. Circular dependency for acquiring locks amongst threads

Definition: A thread i is dependent on another thread j if the thread i is trying to acquire a lock held by thread j

threads

0

1

2

i

j

T-2

T-1

time


Upc check a scalable tool for detecting run time errors in unified parallel c

Chain of dependency for acquiring locks leads to a thread which is

waiting at a collective routine.

threads

0

1

2

i

j

T-2

T-1

time


Upc check a scalable tool for detecting run time errors in unified parallel c

Chain of dependency for acquiring locks leads to a thread which is

reached end of execution.

threads

0

1

2

i

j

T-2

T-1

time

End-of-execution


Algorithm get all the threads in the picture

Algorithm: Get all the threads in the picture

1

i+2

i-1

T-3

T-2

T-1

2

i+1

3

i

j

0


Validation method a basic block

Validation method: A basic block

threads

threads

time

time

R

R

i-1

i

i-1

i


Implementation algorithm 1

Implementation: Algorithm 1

shared [1] deadlock_ctxt_tunified_deadlock_ctxt[THREADS];

i-1

i+1

i


Upc check a scalable tool for detecting run time errors in unified parallel c

shared [1] deadlock_ctxt_tunified_deadlock_ctxt[THREADS];

i-1

i+1

i


Upc check a scalable tool for detecting run time errors in unified parallel c

shared [1] deadlock_ctxt_tunified_deadlock_ctxt[THREADS];

i-1

i+1

i


Upc check a scalable tool for detecting run time errors in unified parallel c

shared [1] deadlock_ctxt_tunified_deadlock_ctxt[THREADS];

i-1

i+1

i


Upc check a scalable tool for detecting run time errors in unified parallel c

shared [1] deadlock_ctxt_tunified_deadlock_ctxt[THREADS];

i-1

i+1

i


Upc check a scalable tool for detecting run time errors in unified parallel c

shared [1] deadlock_ctxt_tunified_deadlock_ctxt[THREADS];

i-1

i+1

i


Atomicity and serialization of status checks

Atomicity and serialization of status checks

  • One centralized lock solution

    • Efficiency hit – complete serialization

  • Decentralized lock solution –one lock per thread

    • shared [1] upc_lock_tupc_check_deadlock_detection_lock[THREADS];

0

1

2

i

i+1

T-3

T-2

T-1


Avoiding deadlocks created by the checks

Avoiding deadlocks created by the checks

0

1

2

i

i+1

T-3

T-2

T-1


Scheme 1 of acquiring locks

Scheme 1 of acquiring locks

Even thread: lock[i] then lock[(i+1) %THREADS]

Odd thread: lock[(i+1) %THREADS] then lock[i]

0

1

2

i

i+1

T-3

T-2

T-1

Legend:

: First lock acquired

: Second lock acquired


Scheme 1 maximum latency of acquiring locks for even number of threads

1 2 2 1 1 2 2 1

1 2 2 1 1 2 2 1

i-1 i i+1 i+2

i-2 i-1 i i+1

Scheme 1: Maximum latency of acquiring locks for even number of threads

Longest dependency chains when i is even

Longest dependency chains when i is odd

Maximum latency is 3 or O(1)


Maximum latency when total number of threads are odd

Maximum latency: when total number of threads are odd

Maximum latency is 4 or O(1)


Efficiency

Efficiency

  • The number of threads for which any thread has to wait before entering its critical section is is O(1).

  • The number of remote memory access is O(1) as any thread i only accesses memory related to the state of only thread I and thread (i+1)%THREADS.

  • Optimal!


When thread reaches a upc lock

When thread reaches a upc_lock

  • Track requested locks and acquired locks

  • Look out cyclical hold-and-wait conditions

  • Look out for chain of hold-and-wait conditions which lead to a thread blocked at a collective routine

    • If a thread has reached a collective routine, check if there is a request for a lock that the thread is holding

  • Look out for chain of hold-and-wait conditions which lead to a thread which has reached end-of-execution

    • If a thread is exiting without freeing all locks held by it, then check if there is a request for a lock that the thread is holding


Papers

Papers

  • Coyle, J., Hoekstra, J., Kraeva, M., Luecke, G. R., Kleiman, R., Roy, I. (2009). UPC Compile-Time Error Detection Test Suite. http://kraeva.public.iastate.edu/rted/UPCct.TestPlan.pdf, Iowa State University High Performance Computing Group.

  • Roy, I., Luecke, G. R., Coyle, J., Kraeva, M., Hoekstra, J. (2011). UPC-CHECK: A run-time error detection tool for programs written in UPC. Preprint

  • Roy, I., Luecke, G. R., Coyle, J., Kraeva, M., Hoekstra, J. (2011). An O(1) algorithm to detect deadlocks in collective routines in the distributed shared memory model. Preprint


Thank you

Thank you


  • Login