Testing Concurrent Programs

Testing Concurrent Programs • Why Test? • Eliminate bugs? • Software Engineering vs Computer Science perspectives • What properties are we testing for? • Safety properties – nothing bad happens • No deadlock • All expected results returned • Liveness properties – something good happens, eventually • Performance

Alternatives to testing • “Proving programs correct” • Formalized, mathematical, characterization of what programs do • Lacking: performance formalization; characterization of JIT compilation; characterization of GC; … • Often unwieldy • What is “correct”? • Specification • Size of specification often exceeds size of code – what confidence do you have that the spec is correct? • Overall, usually unwieldy even for sequential programs • Sometimes successful for carefully defined properties • Takes lots of experience • Usually applied in very-high-value settings: life-critical embedded systems; national security and banking cryptographic systems

Code Reviews • Basic conformance to organizational coding standards • Embodies a belief that coding standards result in more correct code • Way to involve experts – especially helpful for situations like concurrent code where language behavior is subtle – practice helps! • Just more eyes • Can be counterproductive if review turns into a clash of egos

Static Analysis Tools • Get a computer to help with code review • Can keep in mind much more context than any human • Can be absolutely rigorous in checking • For example: check that every instance of a method that accesses a field is either synchronized or not synchronized • Premise: programmers are consistent, mostly • Can check for indicators of a great many common mistakes

Findbugs for Java • http://findbugs.sourceforge.net/demo.html • Checks for hundreds of different errors • Actually checks the compiled code (class files) • Ru: Invokes run on a thread (did you mean to start it instead?) (RU_INVOKE_RUN) • This method explicitly invokes run() on an object. In general, classes implement the Runnable interface because they are going to have their run() method invoked in a new thread, in which case Thread.start() is the right method to call. • SP: Method spins on field (SP_SPIN_ON_FIELD) • This method spins in a loop which reads a field. The compiler may legally hoist the read out of the loop, turning the code into an infinite loop. The class should be changed so it uses proper synchronization (including wait and notify calls). • Checks based on observed mistakes in Java code that compiles!

Aspect-oriented testing • AOP is a method for systematically modifying source code at compile time or run time • E.g. Attach code to method entry and exit, exception handling, etc. based on names and types of methods • Used for implementing “cross-cutting” concerns such as locking and logging – something that should be done the same way at multiple locations in a program • Also handy for inserting testing code

Performance testing • No good tools for predicting system performance (but algorithmic analysis can suggest relative asymptotic behaviors) • Have to do performance testing • Will the code meet the performance spec? • May be highly dependent on machine architecture, exact input, etc. • Also used in tuning program choices: e.g. buffer and thread pool sizes

Kinds of performance • Latency or responsiveness – average (units: ) • Latency distribution – variance • Throughput (units: ) • Resource consumption

Gotchas in Java Performance Testing (General) • Garbage collection • May take arbitrarily long and be triggered at arbitrary points • Turn it off or better, make tests big enough to require multiple GCs • Dynamic compilation • Just-in-time (JIT) compiler • Invoked at arbitrary points – time taken affects measured time for program • Mixing interpreted and compiled execution is nonsensical – result depends on when compilation happened

Concurrency Performance Gotchas • Beware “micro” benchmarks • Focus narrowly on performance of synchronization and contention primitives • Miss the big picture of overall application performance – better to have no contention than a fast contention primitive • Ensuring that significant concurrency occurs in the run • Use a small MP (want more threads than processors)

Counterintuitive Concurrent Performance • Examine Figure 12.1: best performance occurs for 1 thread!

Testing Safety Properties of Concurrent Programs • Remember about interleavings – there are a lot of them; any may have a bug! • Book’s suggestion: first test for sequential correctness then • Create a lot of different interleavings • Use various numbers of threads • Start them all at once • Run them for a long time

Ways to ensure many different interleavings • Use a barrier or latch to ensure that all threads start the test at the same time • Otherwise, scheduler may run them all sequentially, esp. if thread creation overhead is high relative to the test • Use strategically placed Thread.yield() calls at points where a thread switch might be disadvantageous. Unfortunately, only a hint to the scheduler

Testing is expensive • A comprehensive test framework may be larger and more costly than the program being tested, esp. for very high assurance needs • One good idea: if a test ever finds a bug (or is developed in response to an observed bug) keep it in the test suite forever (so-called regression testing)

Testing Concurrent Programs