CHESS : Systematic Testing of Concurrent Programs

CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research

Testing multithreaded programs is HARD • Specific thread interleavings expose subtle errors • Testing often misses these errors • Even when found, errors are hard to debug • No repeatable trace • Source of the bug is far away from where it manifests

Concurrency is a real problem • Windows 2000 hot fixes • Concurrency errors most common defects among “detectable errors” • Incorrect synchronization and protocol errors most common defects among all coding errors • Windows Server 2003 late cycle defects • Synchronization errors second in the list, next to buffer overruns • Race conditions can result in security exploits

Current practice • Concurrency testing == Stress testing • Example: testing a concurrent queue • Create 100 threads performing queue operations • Run for days/weeks • Pepper the code with sleep ( random() ) • Stress increases the likelihood of rare interleavings • Makes any error found hard to debug

CHESS: Unit testing for concurrency • Example: testing a concurrent queue • Create 1 reader thread and 1 writer thread • Exhaustively try all thread interleavings • Run the test repeatedly on a specialized scheduler • Explore a different thread interleaving each time • Use model checking techniques to avoid redundancy • Check for assertions and deadlocks in every run • The error-trace is repeatable

Systematic Stress Testing Using CHESS Program Tester Provides a Test Scenario While(not done) { TestScenario() } CHESS TestScenario() { … } CHESS runs the scenario in a loop • Every run takes a different interleaving • Every run is repeatable Win32 API Kernel: Threads, Scheduler, Synchronization Objects

Conditions on Test Scenario • Test scenario should terminate in all interleavings • Test scenario should be idempotent • Free all resources (handles, memory, …) • Clear the hardware state • Key observation: • Existing stress tests already have these properties • Because they repeatedly run for ever

Perturb the System as Little as Possible • Run the system as is • On the actual OS, hardware • Using system threads, synchronization Program While(not done){ TestScenario() } CHESS TestScenario(){ … } • Detour Win32 API calls • To control and introduce nondeterminism Win32 API • Advantages • Avoid reporting false errors • Easy to add to existing test frameworks • Use existing debuggers Kernel: Threads, Scheduler, Synchronization Objects

Implementation details • Handle all the Win32 synchronization mechanisms • Critical sections, locks, semaphores, events,… • Threadpools • Asynchronous procedure calls • Timers • IO Completions • No modification to the kernel scheduler / Win32 library • CHESS drives the system along a desired by interleaving by ‘hijacking’ the scheduler

Controlling the Scheduling Nondeterminism • Nondeterministic choices for the scheduler • Determine when to context switch • On context switch, pick the next runnable thread to run • On resource release, wake up one of the waiting threads • Hijack these choices from the scheduler • Ensure at most one thread is runnable • No thread is waiting on a resource • At chosen schedule points, block the current thread while waking the next thread • Emulate program execution on a uniprocessor with context switches only at synchronization points

Partial-order reduction • Many thread interleavings are equivalent • Accesses to separate memory locations by different threads can be reordered • Avoid exploring equivalent thread interleavings T1: x := 1 T2: y := 2 T2: y := 2 T1: x := 1

Partial-order reduction in CHESS • Algorithm: • Assume the program is data-race free • Context switch only at synchronization points • Check for data-races in each execution • Theorem: • If the algorithm terminates without reporting races, • then the program has no assertion failures

Executions on Multi-cores • CHESS checks for data-races • If a Test Scenario manifests a bug on a multi-core machine, then CHESS will • Either report a data-race • Or the bug • CHESS systematically enumerates all sequentially consistent executions • Any data-race free multi-core execution is equivalent to a sequentially consistent execution

State space explosion Thread 1 Thread 2 x = 1; y = 1; x = 2; y = 2; 0,0 2,0 1,0 x = 1; 1,0 2,2 1,1 2,0 y = 1; x = 2; 1,2 1,2 2,1 2,1 1,1 2,2 y = 2; 1,2 1,1 1,1 2,1 2,2 2,2

State space explosion • Number of executions = O( nnk ) • Exponential in both n and k • Typically: n < 10 k > 100 • Limits scalability to large programs (large k) Thread 1 Thread n x = 1; … … … … … y = 1; x = 2; … … … … … y = 2; … k steps each n threads

Bounding execution depth • Works very well for message-passing programs • Limit the number of message exchanges • Message processing code executed atomically • Can go ‘deep’ in the state space • Does not work for multithreaded programs • Even toy programs can have large number of steps (shared-variable accesses)

Iterative context bounding • Prioritize executions with small number of preemptions • Two kinds of context switches: • Preemptions – forced by the scheduler • e.g. Time-slice expiration • Non-preemptions – a thread voluntarily yields • e.g. Blocking on an unavailable lock, thread end Thread 1 Thread 2 x = 1; if (p != 0) { x = p->f; } x = 1; if (p != 0) { p = 0; preemption x = p->f; } non-preemption

Iterative context-bounding algorithm • The scheduler has a budget of c preemptions • Nondeterministically choose the preemption points • Resort to non-preemptive scheduling after c preemptions • Once all executions explored with c preemptions • Try with c+1 preemptions • Iterative context-bounding has desirable properties • Property 0: Easy to implement

Property 1: Polynomial state space • Terminating program with fixed inputs and deterministic threads • n threads, k steps each, c preemptions • Number of executions <= nkCc . (n+c)! = O( (n2k)c. n! ) Exponential in n and c, but not in k Thread 1 Thread 2 • Choose c preemption points x = 1; … … … … … y = 1; x = 1; … … … … x = 2; … … … … … y = 2; x = 2; … … … • Permute n+c atomic blocks … … … y = 1; y = 2;

Property 2: Deep exploration possible with small bounds • A context-bounded execution has unbounded depth • a thread may execute unbounded number of steps within each context • Event a context-bound of zero yields complete terminating executions

Property 3: Finds the ‘simplest’ error trace • Finds smallest number of preemptions to the error • Number of preemptions better metric of error complexity than execution length

Property 4: Coverage metric • If search terminates with context-bound of c, then any remaining error must require at least c+1 preemptions • Intuitive estimate for • The complexity of the bugs remaining in the program • The chance of their occurrence in practice

Property 5: Lots of bugs with small number of preemptions • A non-blocking implementation of the work-stealing queue algorithm • bounded circular buffer accessed concurrently by readers and stealers • Developer provided • test harness • three buggy variations of the program • Each bug found with at most 2 preemptions • executions with 35 preemptions are possible!

Context-bounding + Partial-order reduction • Algorithm: • Assume the program is data-race free • Context switch only at synchronization points • Explore executions with c preemptions • Check for data-races in each execution • Theorem: • If the algorithm terminates without reporting races, • Then the program has no assertion failures reachable with c preemptions • Requires that a thread can block only at synchronization points • Proof (Musuvathi-Q, PLDI 2007)

Bugs found

// Function called by the main thread void TestChannel(WorkQueue* workQueue, ...) { // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue); // ... do work here channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel } // Function called by a worker thread // of RChannelReaderImpl void RChannelReaderImpl:: AlertApplication(RChannelItem* item) { // Notify Application // XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS); }

Facts about Dryad error trace • Long error trace but requires only one preemption • Depth-bounding cannot find it without a lot of luck • The error trace has 6 non-preempting context switches • It is important to leave unbounded the number of non-preempting context switches • This (and the other 6 errors) in Dryad remained in spite of careful regression testing and months of production use

Bugs found

Coverage vs. Context-bound

Dryad (coverage vs. time)

Current CHESS applications (work in progress) • Dryad (library for distributed dataflow programming) • Singularity/Midori (OS in managed code) • User-mode drivers • Cosmos (distributed file system) • SQL database

Conclusion • Concurrency is important • Building robust concurrent software is still a challenge • Lack of debugging and testing tools • CHESS: Concurrency unit-testing • Exhaustively try all interleavings • Attempt to seamlessly integrate with existing test frameworks • Provide replay capability • Iterative context-bounding algorithm key to the design

CHESS : Systematic Testing of Concurrent Programs

CHESS : Systematic Testing of Concurrent Programs

Presentation Transcript

Omnibus Transportation Employee Testing Act (OTETA)

Chapter 9

Chapter 7 Hypothesis Testing

Concurrency Testing Challenges, Algorithms, and Tools

Concurrent Programming

CS 406 Software Testing Fall 98 Part II : Functional Testing

Automating Coded UI Testing using Microsoft Testing Tools in Visual Studio 2012

Parallel and Concurrent Programming

Program Analysis and Synthesis of Parallel Systems

Chapter 8 – Software Testing

User-Centered Design and Testing Unit 1

Black-Box Testing Techniques III

Atomic Actions, Concurrent Processes and Reliability

Outline for Today

Outline for Today

TESTING…

Systematic Review: Analytical Methods of Meta-analysis

Chapter 9

Schedule 2: Concurrent Serializable Schedule

Hypothesis Testing

Black-Box Testing Techniques II

Testing Software Systems