A Novel Test Coverage Metric for Concurrently-Accessed Software Components

Serdar Tasiran, Tayfun Elmas, Guven Bolukbasi, M. Erkan KeremogluKoç University, Istanbul, Turkey A Novel Test Coverage Metric for Concurrently-Accessed Software Components PLDI 2005, June 12-15, Chicago, U.S.

Our Focus • Widely-used software systems are built on concurrently-accessedsoftware components • File systems, databases, internet services • Standard Java and C# class libraries • Intricate synchronization mechanisms to improve performance • Prone to concurrency errors • Concurrency errors • Data loss/corruption • Difficult to detect, reproduce through testing PLDI 2005, June 12-15, Chicago, U.S.

The Location Pairs Metric Goal of metric: To help answer the question • “If I am worried about concurrency errors only, what unexamined scenario should I try to trigger?” • Coverage metrics: Link between validation tools • Communicate partial results, testing goals between tools • Direct tools toward unexplored, distinct new executions • The “location pairs” (LP) metric • Directed at concurrency errors ONLY • Focus: “High-level” data races • Atomicity violations • Refinement violations • All variables may be lock-protected, but operations not implemented atomically PLDI 2005, June 12-15, Chicago, U.S.

Outline • Runtime Refinement Checking • Examples of Refinement/Atomicity Violations • The “Location Pairs” Metric • Discussion, Ongoing Work PLDI 2005, June 12-15, Chicago, U.S.

Call LookUp(3) Thread 1 Thread 4 Thread 2 Thread 3 Call Insert(3) LookUp(3) Insert(3) Delete(3) Insert(4) A[0].elt=3 ..... ..... ..... ..... Unlock A[0] Call Delete(3) Call Insert(4) read A[0] A[1].elt=4 Return “true” Unlock A[1] Return“success” A[0].elt=null Return“success” Unlock A[0] Return“success” Refinement as Correctness Criterion • Client threads invokeoperations concurrently • Data structure operations should appear to be executed • atomically • in a linear order to client threads. ComponentImplementation PLDI 2005, June 12-15, Chicago, U.S.

Runtime Refinement Checking • Refinement • For each execution of Impl • there exists an “equivalent”, atomic execution of data structure Spec • Spec: “Atomized” version of Impl • Client methods run one at a time • Obtained from Impl itself • Userefinement as correctness criterion • More thorough than assertions • More observability than pure testing • Runtime verification: Check refinement using execution traces • Can handle industrial-scale programs • Intermediate between testing & exhaustive verification PLDI 2005, June 12-15, Chicago, U.S.

The VYRD Tool Multi-threaded test Impl Write to log ... Return“success” Return“success” Call LookUp(3) Unlock A[1] Call Insert(3) A[0].elt=3 Unlock A[0] read A[0] Unlock A[0] Return“success” A[1].elt=4 Return “true” A[0].elt=null Call Insert(4) Call Delete(3) Read from log Execute logged actions Run methods atomically Replay Mechanism Implreplay Spec Refinement Checker traceImpl traceSpec • At certain points for each method, take “state snapshots” • Check consistency of data structure contents PLDI 2005, June 12-15, Chicago, U.S.

The Vyrd Experience • Scalable method: Caught previously undetected, serious but subtle bugs in industrial-scale designs • Boxwood (30K LOC) • Scan Filesystem (Windows NT) • Java Libraries with known bugs • Reasonable runtime overhead • Key novelty: Checking refinement improves observability • Catches bugs that are triggered but not observed by testing • Significant improvement PLDI 2005, June 12-15, Chicago, U.S.

The Boxwood Project BLINKTREE MODULE Root Level Root Pointer Node Level n+1................ Internal Pointer Node Internal Pointer Node Internal Pointer Node Level n.................. Internal Pointer Node Internal Pointer Node Internal Pointer Node ...... ..... ..... ..... ..... ............... .... .... ......... ........ ............ ..... ..... ..... ..... ............... .... .... ......... ........ ...... ... Level 0 ... Leaf Pointer Node Leaf Pointer Node Leaf Pointer Node ... Data Nodes ... Data Node Data Node Data Node Data Node Data Node Data Node Read Write CACHE MODULE CHUNK MANAGER MODULE Write Read GlobalDiskAllocator Cache Dirty Cache Entries ... ... ReplicatedDisk Manager Clean Cache Entries Experience PLDI 2005, June 12-15, Chicago, U.S.

Refinement vs. Testing: Improved Observability • Using Vyrd, caught previously undetected bug in • Boxwood Cache • Scan File System (Windows NT) • Bug manifestation: • Cache entry is correct, marked “clean” • Permanent storage has corrupted data • Hard to catch through testing • As long as “Read”s hit in Cache, return value correct • Caught through testing only if • Cache fills, clean entry in Cache is evicted • Not written again to permanent storage since entry is marked “clean” • Entry read from permanent storage after eviction • With no “Write”s to entry in the meantime PLDI 2005, June 12-15, Chicago, U.S.

Idea behind the LP metric • Observation: Bug occurs whenever • Method1 executes up to line X, context switch occurs • Method2 starts execution from line Y • Provided there is a data dependency between • Method1’s code “right before” line X: BlockX • Method2’s code “right after” line Y: BlockY • Description of bug in the log follows pattern above • Only requirement on program state, other threads, etc.: • Make the interleaving above possible • May require many other threads, complicated program state, ... • A “one-bit” data abstraction captures error scenario • Depdt: Is there a data dependency between BlockX and BlockY PLDI 2005, June 12-15, Chicago, U.S.

public synchronized StringBuffer append(StringBuffer sb) { public synchronized void setLength(int newLength) { int len = sb.length(); int newCount = count + len; if (newCount > value.length) ensureCapacity(newCount); ... if (count < newLength) ... } else { count = newLength; ... } return this; sb.getChars(0, len, value, count); count = newCount; } return this; } PLDI 2005, June 12-15, Chicago, U.S.

Concurrency Bug in Cache Cache Cache Cache Cache Cache handle handle handle handle handle A Y A Y A B X Y A B Chunk Manager Chunk Manager Chunk Manager Chunk Manager Chunk Manager handle handle handle handle handle A Y A Y X Z T Z A Y Different byte-arrays for the same handle Corrupted data in persistent storage Experience Flush()starts Write(handle,AB) starts Write(handle, AB)ends Flush() ends PLDI 2005, June 12-15, Chicago, U.S.

private static void CpToCache( byte[] buf, CacheEntry te, int lsn, Handle h sb) { public static void Flush(int lsn) { ... lock (clean) { for (int i=0; i<buf.length; i++) { BoxMain.alloc.Write(h, te.data, te.data.length, 0, 0, WRITE_TYPE_RAW); te.data[i] = buf[i]; } } ... te.lsn = lsn } } PLDI 2005, June 12-15, Chicago, U.S.

public synchronized StringBuffer append(StringBufer sb) {1 int len = sb.length();2 int newCount = count + len;3 if (newCount > value.length) {4 ensureCapacity(newCount);5 sb.getChars(0, len, value, count);6 count = newCount;7 return this;8 } -----------------------------------acquire(this)-----------------------------------invoke sb.length()--------------------------– L1 ----int len = sb.length()--------------------------- L2 ----int newCount = count + len -----------------------------------if (newCount > value.length) -----------------------------------expandCapacity(newCount); -----------------------------------invoke sb.getChar()-----------------------------------sb.getChars(0, len, value, count)--------------------------–--------count = newCount-----------------------------------return this

Coverage FSM State Method 2 Method 1 (LX, pend1, LY, pend2, depdt) Location inthe CFG ofMethod 2 Location inthe CFG ofMethod 1 Do actions following LX and LY have a data dependency? Is an “interesting” action in Method 2 expected next? Is an “interesting” action in Method 1 is expected next? PLDI 2005, June 12-15, Chicago, U.S.

(L2, !pend1, L3, pend2, !depdt) (L1, pend1, L3, !pend2, !depdt) (L1, !pend1, L3, !pend2, !depdt) Coverage FSM (L1, !pend1, L3, !pend2, depdt) t1: L1  L2 t2: L3  L4 t2: L3  L4 t1: L1  L2 PLDI 2005, June 12-15, Chicago, U.S.

Coverage Goal • The “pend1” bit gets set when • The depdt bit is TRUE • Method2 takes an action • Intuition: Method1’s dependent action must follow • Must cover all (reachable) transitions of the form • p = (LXp, TRUE, LY, pend2p, depdtp)  q = (LXq, pend1q, LY, pend2q, depdtq) • p = (LX, pend1p, LYp, TRUE, depdtp)  q = (LX, pend1q, LYq, pend2q, depdtq) • Separate coverage FSM for each method pair: FSM(Method1, Method2) • Cover required transitions in each FSM PLDI 2005, June 12-15, Chicago, U.S.

Important Details • Action: Atomically executed code fragment • Defined by the language • Method calls: • Call action: Method call, all lock acquisitions • Return action: Total net effect of method, atomically executed + lock releases • Separate coverage FSM for each method pair: FSM(Method1, Method2) • Cover required transitions in each FSM • But what if there is interesting concurrency inside called method? • Considered separately when that method is considered as one in the method pair • If Method1 calls Method3: • Considered when FSM(Method3, Method2) is covered PLDI 2005, June 12-15, Chicago, U.S.

Empirical evidence • Does this metric correspond well with high-level concurrency errors? • Errors captured by metric • 100% metric  Bug guaranteed to be triggered • Triggered vs. detected: • May need view refinement checking to improve observability • Preliminary study • Bugs in Java class libraries • Bug found in Boxwood cache • Bug found in Scan file system • Bugs categories reported in E. Farchi, Y. Nir, S. UrConcurrent Bug Patterns and How to Test Them 17th Intl. Parallel and Distributed Processing Symposium (IDPDS ’03) • How many are covered by random testing? How does coverage change over time? • Don’t know yet. Implementing coverage measurement tool. PLDI 2005, June 12-15, Chicago, U.S.

Reducing the Coverage FSM • Method-local actions: • Basic block consisting of method-local actions considered a single atomic action • Pure blocks [Flanagan & Qadeer, ISSTA ’04] • A “pure” execution of pure block does not affect global state • Example: Acquire lock, read global variable, decide resource not free, release lock • Considered a “no-op” • Modeled by “bypass transition” in coverage FSM. • Does not need to be covered PLDI 2005, June 12-15, Chicago, U.S.

Discussion • The metric is NOT for deciding when to stop testing/verification • Intended use: • Testing, runtime verification is applied to program • List of non-covered coverage targets provided to programmer • Intuition: Given an unexercised scenario, the programmer must have a simple reason to believe that • the scenario is not possible, or • the scenario is safe • Given uncovered coverage target, programmer • either provides hints to coverage tool to rule target out • or, assumes that coverage target is a possibility, • writes test to trigger it • or, makes sure that no concurrency error would result if coverage target were to be exercised PLDI 2005, June 12-15, Chicago, U.S.

# of locations per method in Boxwood: ~10, after factoring out atomic and pure blocks LP reachability undecidable Metric only intended as aid to programmer What have I tested? What should I try to test? Make sure LP does not lead to error if it looks like it can be exercised. Future work: Better approximate reachable LP set Do conservative reachability analysis of coverage FSM using predicate abstraction. Programmer can add predicates for better FSM reduction Future Work: Approximating Reachable LP Set PLDI 2005, June 12-15, Chicago, U.S.

PLDI 2005, June 12-15, Chicago, U.S.

Implementation: LookUp content 6 2 3 9 3 3 8 5 8 null           valid A Multiset • Multiset data structure M = { 2, 3, 3, 3, 9, 8, 8, 5 } • Has highly concurrent implementations of • Insert • Delete • InsertPair • LookUp LookUp (x) for i =1 to n acquire(A[i]) if(A[i].content==x &&A[i].valid) release(A[i]) return true else release(A[i]) returnfalse PLDI 2005, June 12-15, Chicago, U.S.

Testing Call LookUp(3) Call Insert(3) A[0].elt=3 Unlock A[0] Call Delete(3) Call Insert(4) read A[0] A[1].elt=4 Return “true” Unlock A[1] Return“success” Return“success” A[0].elt=null Unlock A[0] Return“success” Multiset • Don’t know which happened first • Insert(3) or Delete(3) ? • Should 3 be in the multiset at the end? • Must accept both possibilities as correct • Common practice: • Run long multi-threaded test • Perform sanity checks on final state PLDI 2005, June 12-15, Chicago, U.S.

I/O Refinement Call LookUp(3) Spectrace Witness ordering M=Ø Call Insert(3) A[0].elt=3 A[0].elt=3 Call Insert(3) CommitInsert(3) Unlock A[0] Unlock A[0] M = M U {3} M = M U {3} Call Delete(3) Return“success” {3} Call Insert(4) Call LookUp(3) CommitLookUp(3) read A[0] read A[0] Check 3M Check 3M A[1].elt=4 A[1].elt=4 Return “true” {3} Return “true” Call Insert(4) CommitInsert(4) Unlock A[1] Unlock A[1] M = M U {4} M = M U {4} Return“success” Return“success” {3, 4} Return“success” A[0].elt=null A[0].elt=null Call Delete(3) CommitDelete(3) Unlock A[0] Unlock A[0] M = M \ {3} M = M \ {3} Return“success” Return “success” {4} Multiset PLDI 2005, June 12-15, Chicago, U.S.

View Variables  3 5 3 8 8 5 9 6 5 content A valid           viewImpl={3, 3, 5, 5, 8, 8, 9} View-refinement • State correspondence • Hypothetical “view” variables must match at commit points • “view” variable: • Value of variable is abstract data structure state • Updated atomically once by each method • For A[1..n] • Extract contentif valid=true PLDI 2005, June 12-15, Chicago, U.S.

Spectrace Witness ordering M=Ø Call Insert(3) CommitInsert(3) M = M U {3} Return“success” {3} Call LookUp(3) CommitLookUp(3) Check 3M Return “true” {3} Call Insert(4) CommitInsert(4) M = M U {4} Return“success” {3, 4} Call Delete(3) CommitDelete(3) M = M \ {3} Return “success” {4} View-refinement Call LookUp(3) Call Insert(3) A[0].elt=3 viewImpl = {3} viewSpec = {3} Call Delete(3) Call Insert(4) viewImpl = {3} viewSpec = {3} A[1].elt=4 Return “true” viewImpl = {3,4} viewSpec = {3,4} Return“success” Return“success” A[0].elt=null viewSpec = {4} viewImpl = {4} Return“success” PLDI 2005, June 12-15, Chicago, U.S.

A Novel Test Coverage Metric for Concurrently-Accessed Software Components

A Novel Test Coverage Metric for Concurrently-Accessed Software Components

Presentation Transcript

Test Coverage

test coverage tools

Safety as a Software Metric

Maximum Test Coverage Minimum Test Cases

A Metric for Software Readability

A Diagnostic Test Generation System and a Coverage Metric

Test coverage

A Novel Metric for Interconnect Architecture Performance

CMDI Software Components

Software Metric

A code metric tool for Software Engineering

4foldnotes: A Novel Note-Taking Software

Software Test Coverage

Metric Test

Evaluation of a Novel Two-Step Server Selection Metric

EFS – Software Components

Test Coverage

Test Software for EMT

Metric Test Quiz Bowl

Software metric

Software Test Coverage