Create Presentation
Download Presentation

Download Presentation

A Rand omized Algorithm for Concurrency Testing

Download Presentation
## A Rand omized Algorithm for Concurrency Testing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**A Randomized Algorithm for Concurrency Testing**Madan Musuvathi Research in Software Engineering Microsoft Research**The Concurrency Testing Problem**• A closed program = program + test harness • Test harness encodes both the concurrency scenario and the inputs • The only nondeterminism is the thread interleavings**Verification vs Testing**• Verification: • Prove that the program is correct (free of bugs) • With the minimum amount of resources • Testing: ??**Verification vs Testing**• Verification: • Prove that the program is correct (free of bugs) • With the minimum amount of resources • Testing: • Given a certain amount of resources • How close to a proof you can get? • Maximize the number of bugs that you can find • In the limit: Verification == Testing**Testing is more important than Verification**• Undecidability argument • There is always going to be programs large enough and properties complex enough for which verification cannot be done • Economic argument • If the cost of a bug is lesser than the cost finding the bug (or proving its absence) • You are better off shipping buggy software • Engineering arugment • Make software only as reliable as the weakest link in the entire system**Providing Probabilistic Guarantees**• Problem we would like to solve: • Given a program, prove that it does not do something wrong with probability > 95% • Problem we can hope to solve: • Given a program that contains a bug, design a testing algorithm that finds the bug with probability > 95% • Prove optimality: no testing algorithm can do better**Cuzz: Concurrency Fuzzing**• Disciplined randomization of schedules • Probabilistic guarantees • Every run finds a bug with some (reasonably large) probability • Repeat runs to increase the chance of finding a bug • Scalable • In the no. of threads and program size • Effective • Bugs in IE, Firefox, Office Communicator, Outlook, … • Bugs found in the first few runs**Problem Formulation**• P is a class of programs • B is a class of bugs • Given P and B, you design a testing algorithm T • Given T, the adversary picks a program p in P containing a bug b in B • Given p, T generates an input in constant time • Prove that T finds b in p with a probability X(p,B)**In our case**• P is a class of closed terminating concurrent programs • B is a class of bugs • Given P and B, you design a testing algorithm T • Given T, the adversary picks a program p in P containing a bug b in B • Given p, T generates an interleaving in constant time • Prove that T finds b in p with a probability X(p,B)**Useful parameters**• For a closed terminating concurrent program • (Fancy way of saying, a program combined with a concurrency test) • n : maximum number of threads • k : maximum number of instructions executed**What is a “Bug” – first attempt**• Bug is defined as a particular buggy interleaving • No algorithm can find the bug with a probably greater than 1/nk n threads (~ tens) k instructions (~ millions) nk schedules**A Deterministic Algorithm**• Provides no guarantees n threads (~ tens) k instructions (~ millions) nk schedules**Randomized Algorithm**• Samples the schedule space with some probability distribution • Adversary picks the schedule that is the least probable • Probability of finding the bug <= 1/nk n threads (~ tens) k instructions (~ millions) nk schedules**Randomized Algorithm**• 1/nkis a mighty small number • Hard to design algorithms that find the bug with probability == 1/nk n threads (~ tens) k instructions (~ millions) nk schedules**A Good Research Trick**• When you cant solve a problem, change the problem definition**Bugs are not adversarial**• Usually, if there is one interleaving that finds the bug there are many interleavings that find the same bug • This is not true for program inputs • These set of interleavings that find the bug share the same root cause • The root cause of real bugs are not complicated • Smart people make stupid mistakes**Classifying Bugs**• Classify concurrency bugs based on a suitable “depth” metric • Adversary can chose any bug but within a given depth • Testing algorithm provides better guarantees for bugs with a smaller depth • Even if worst-case probability is less than 1/nk • We want real bugs to have small depth • We want to be able design effective sampling algorithms for finding bugs of a particular depth**Our Bug Depth Definition**• Bug Depth = number of ordering constraints sufficient to find the bug • Best explained through examples**A Bug of Depth 1**• Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D E F G H I J A BF G HC D E I J A BF GC D EH I J A BF GCHD EI J A BF G H I J C D E … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …**A Bug of Depth 2**• Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D EF G H I J A B CD EH I J F G A B C H I D E G J A B C D HEFI J G A B CH D EI J F G … Parent Child A: … B: p = malloc(); C: fork (child); D: …. E: if (p != NULL) F: p->f ++; G: H: … I: p = NULL; J : ….**Another Bug of Depth 2**• Bug Depth = no. of ordering constraints sufficient to find the bug Parent Child A: … B: Lock (A); C: … D: Lock (B); E: … F: … G: Lock (B); H: … I: Lock (A); J: …**Hypothesis**• Most concurrency bugs in practice have a very small depth • What has been empirically validated : • There are lots of bugs of small depths in real programs**Defining a Bug**• A schedule is a sequence of (dynamic) instructions • S = set of schedules of a closed program • A concurrency bug B is a strict subset of S**Ordering Constraints**• A schedule satisfies an ordering constraint (a,b) if instruction a occurs before instruction b in the schedule A BF G HC D E I J Satisfies (H, C) Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …**Depth of a Bug**• S(c1,c2,…cn) = set of schedules that satisfy the ordering constraints c1,c2,…cn • A bug B is of depth ≤ d, if there exists constraints c1,c2,…cd such that S(c1,c2,…cd) B**A Bug of Depth 1**• Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D E F G H I J A BF G HC D E I J A BF GC D EH I J A BF GCHD EI J A BF G H I J C D E … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …**What is the Depth of this Bug**Parent Child A: … B: p = malloc(); C: fork (child); D: allocated = 1 E: p = null; Any buggy interleaving satisfies (D, G) && (E, H) Bug depth <= 2 F: …. G: if(allocated) H: p->f++; I: … J: …**What is the Depth of this Bug**Parent Child Any interleaving that satisfies (E,G) is buggy Bug depth == 1 Even though there are buggy interelavings that don’t satisfy (E,G) A: … B: p = malloc(); C: fork (child); D: allocated = 1 E: p = null; F: …. G: if(allocated) H: p->f++; I: … J: …**Lets look at the complicated bug**void AddToCache() { // ... A: x &= ~(FLAG_NOT_DELETED); B: x |= FLAG_CACHED; MemoryBarrier(); // ... } AddToCache(); assert( x & FLAG_CACHED );**The bit operations are not atomic**void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );**The bug**void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED ); void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );**Cuzz Guarantee**• Given a program that creates at most n threads and executes at most k instructions • Cuzz finds every bug of depth d with probability in every run of the program**A Bug of Depth 1**• Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/n • n: no. of threads (~ tens) Possible schedules A B C D E F G H I J A BF G HC D E I J A BF GC D EH I J A BF GCHD EI J A BF G H I J C D E … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …**A Bug of Depth 2**• Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/nk • n: no. of threads (~ tens) • k: no. of instructions (~ millions) Possible schedules A B C D EF G H I J A B CD EH I J F G A B C H I D E G J A B C D HEFI J G A B CH D EI J F G … Parent Child A: … B: p = malloc(); C: fork (child); D: …. E: if (p != NULL) F: p->f ++; G: H: … I: p = NULL; J : ….**Another Bug of Depth 2**• Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/nk • n: no. of threads (~ tens) • k: no. of instructions (~ millions) Parent Child A: … B: Lock (A); C: … D: Lock (B); E: … F: … G: Lock (B); H: … I: Lock (A); J: …**Cuzz Algorithm**Inputs: n: estimated bound on the number of threads k: estimated bound on the number of steps d: target bug depth // 1. assign random priorities >= d to threads for t in [1…n] do priority[t] = rand() + d; // 2. chose d-1 lowering points at random for i in [1...d) do lowering[i] = rand() % k; steps = 0; while (some thread enabled) { // 3. Honor thread priorities Let t be the highest-priority enabled thread; schedule t for one step; steps ++; // 4. At the ith lowering point, set the priority to i if steps == lowering[i] for some i priority[t] = i; }**A Bug of Depth 1**• Found when child has a higher probability than the parent (prob = ½) Parent Pri = 1 Child Pri = 2 fork (child); p = malloc(); fork (child); do_init(); p->f ++; p = malloc();**A Bug of Depth 2**• Found when the parent starts with a higher probability and a lowering point is inserted after the branch condition (prob= 1/2*5 = 1/10) Parent Pri = 3 Child Pri = 2 p = malloc(); fork (child); if (p != NULL) p->f ++; p = malloc(); fork (child); if (p != NULL) p = NULL; Lowering Point Pri = 1 p->f ++;**In Practice, CuzzBeats its Bound**• Cuzz performs far greater than the theoretical bound • The worst-case bound is based on a conservative analysis • We employ various optimizations • Programs have LOTS of bugs • Probability of finding any of the bug is (roughly) the sum of the probability of finding each • The buggy code is executed LOTS of times**For Some of our Benchmarks**• Probability increases with n, stays the same with k • In contrast, worst-case bound = 1/nkd-1**Dimension Theory**• Any partial-order G can be expressed as an intersection of a set of total orders • This set is called a realizer of G a a b c d e = b d a d b e c c e**Property of Realizers**• For any unordered pair a and b, a realizer contains two total orders that satisfy (a,b) and (b,a) a a b c d e = b d a d b e c c e**Dimension of a Partial Order**• Dimension of Gis the size of the smallest realizer of G • Dimension is 2 for this example a a b c d e = b d a d b e c c e**Why is it called “dimension”**• You can encode a partial-order of dimension d as points in a d-dimensional space c a e = b d b c e d a a b c d e a d b e c**Why is it relevant for us**• P = Set of all partial orders, B = Set of all bugs of depth 1 • If you can uniformly sample the smallest realizer of a partial order p • Probability of any bug of depth 1 >= 1/dimension(p) a a b c d e = b d a d b e c c e**All this is good, but**• Finding the dimension of a partial order in NP complete • Real programs are not static partial-orders**Width of a Partial-Order**• Width of a partial-order G is the minimum number of total orders needed to cover G • Width corresponds to the number of “threads” in G • For all G, Dimension(G) <= Width(G) a a b d b d is covered by c e c e**Cuzz Algorithm**• Cuzz is an online randomized algorithm for uniformly sampling a realizer of size Width(G) • Assign random priorities to “threads” and topologically sort based on the priorities a a b c d e = b d a d b e c c e**Extension to Larger Depths**• Note: a realizer of G covers all possible orderings of an unordered pair • We define a d-realizer of G as a set of total orders that covers all possible orderings of d unordered pairs • d-dimension of G is the size of the smallest d-realizer of G • Theorem • d-Dimension(G) <= Dimension(G) . kd-1 • where k is the number of nodes in G • Cuzz is an online algorithm for uniformly sampling over a d-realizer of G