1 / 56

A Rand omized Algorithm for Concurrency Testing

A Rand omized Algorithm for Concurrency Testing. Madan Musuvathi Research in Software Engineering Microsoft Research. The Concurrency Testing Problem. A closed program = program + test harness Test harness encodes both the concurrency scenario and the inputs

zea
Download Presentation

A Rand omized Algorithm for Concurrency Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Randomized Algorithm for Concurrency Testing Madan Musuvathi Research in Software Engineering Microsoft Research

  2. The Concurrency Testing Problem • A closed program = program + test harness • Test harness encodes both the concurrency scenario and the inputs • The only nondeterminism is the thread interleavings

  3. Verification vs Testing • Verification: • Prove that the program is correct (free of bugs) • With the minimum amount of resources • Testing: ??

  4. Verification vs Testing • Verification: • Prove that the program is correct (free of bugs) • With the minimum amount of resources • Testing: • Given a certain amount of resources • How close to a proof you can get? • Maximize the number of bugs that you can find • In the limit: Verification == Testing

  5. Testing is more important than Verification • Undecidability argument • There is always going to be programs large enough and properties complex enough for which verification cannot be done • Economic argument • If the cost of a bug is lesser than the cost finding the bug (or proving its absence) • You are better off shipping buggy software • Engineering arugment • Make software only as reliable as the weakest link in the entire system

  6. Providing Probabilistic Guarantees • Problem we would like to solve: • Given a program, prove that it does not do something wrong with probability > 95% • Problem we can hope to solve: • Given a program that contains a bug, design a testing algorithm that finds the bug with probability > 95% • Prove optimality: no testing algorithm can do better

  7. Cuzz: Concurrency Fuzzing • Disciplined randomization of schedules • Probabilistic guarantees • Every run finds a bug with some (reasonably large) probability • Repeat runs to increase the chance of finding a bug • Scalable • In the no. of threads and program size • Effective • Bugs in IE, Firefox, Office Communicator, Outlook, … • Bugs found in the first few runs

  8. Cuzz Demo

  9. Problem Formulation • P is a class of programs • B is a class of bugs • Given P and B, you design a testing algorithm T • Given T, the adversary picks a program p in P containing a bug b in B • Given p, T generates an input in constant time • Prove that T finds b in p with a probability X(p,B)

  10. In our case • P is a class of closed terminating concurrent programs • B is a class of bugs • Given P and B, you design a testing algorithm T • Given T, the adversary picks a program p in P containing a bug b in B • Given p, T generates an interleaving in constant time • Prove that T finds b in p with a probability X(p,B)

  11. Useful parameters • For a closed terminating concurrent program • (Fancy way of saying, a program combined with a concurrency test) • n : maximum number of threads • k : maximum number of instructions executed

  12. What is a “Bug” – first attempt • Bug is defined as a particular buggy interleaving • No algorithm can find the bug with a probably greater than 1/nk n threads (~ tens) k instructions (~ millions) nk schedules

  13. A Deterministic Algorithm • Provides no guarantees n threads (~ tens) k instructions (~ millions) nk schedules

  14. Randomized Algorithm • Samples the schedule space with some probability distribution • Adversary picks the schedule that is the least probable • Probability of finding the bug <= 1/nk n threads (~ tens) k instructions (~ millions) nk schedules

  15. Randomized Algorithm • 1/nkis a mighty small number • Hard to design algorithms that find the bug with probability == 1/nk n threads (~ tens) k instructions (~ millions) nk schedules

  16. A Good Research Trick • When you cant solve a problem, change the problem definition

  17. Bugs are not adversarial • Usually, if there is one interleaving that finds the bug there are many interleavings that find the same bug • This is not true for program inputs • These set of interleavings that find the bug share the same root cause • The root cause of real bugs are not complicated • Smart people make stupid mistakes

  18. Classifying Bugs • Classify concurrency bugs based on a suitable “depth” metric • Adversary can chose any bug but within a given depth • Testing algorithm provides better guarantees for bugs with a smaller depth • Even if worst-case probability is less than 1/nk • We want real bugs to have small depth • We want to be able design effective sampling algorithms for finding bugs of a particular depth

  19. Our Bug Depth Definition • Bug Depth = number of ordering constraints sufficient to find the bug • Best explained through examples

  20. A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D E F G H I J  A BF G HC D E I J  A BF GC D EH I J  A BF GCHD EI J A BF G H I J C D E  … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …

  21. A Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D EF G H I J  A B CD EH I J F G A B C H I D E G J A B C D HEFI J G A B CH D EI J F G  … Parent Child A: … B: p = malloc(); C: fork (child); D: …. E: if (p != NULL) F: p->f ++; G: H: … I: p = NULL; J : ….

  22. Another Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug Parent Child A: … B: Lock (A); C: … D: Lock (B); E: … F: … G: Lock (B); H: … I: Lock (A); J: …

  23. Hypothesis • Most concurrency bugs in practice have a very small depth • What has been empirically validated : • There are lots of bugs of small depths in real programs

  24. Defining a Bug • A schedule is a sequence of (dynamic) instructions • S = set of schedules of a closed program • A concurrency bug B is a strict subset of S

  25. Ordering Constraints • A schedule satisfies an ordering constraint (a,b) if instruction a occurs before instruction b in the schedule A BF G HC D E I J Satisfies (H, C) Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …

  26. Depth of a Bug • S(c1,c2,…cn) = set of schedules that satisfy the ordering constraints c1,c2,…cn • A bug B is of depth ≤ d, if there exists constraints c1,c2,…cd such that S(c1,c2,…cd) B

  27. A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D E F G H I J  A BF G HC D E I J  A BF GC D EH I J  A BF GCHD EI J A BF G H I J C D E  … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …

  28. What is the Depth of this Bug Parent Child A: … B: p = malloc(); C: fork (child); D: allocated = 1 E: p = null; Any buggy interleaving satisfies (D, G) && (E, H) Bug depth <= 2 F: …. G: if(allocated) H: p->f++; I: … J: …

  29. What is the Depth of this Bug Parent Child Any interleaving that satisfies (E,G) is buggy Bug depth == 1 Even though there are buggy interelavings that don’t satisfy (E,G) A: … B: p = malloc(); C: fork (child); D: allocated = 1 E: p = null; F: …. G: if(allocated) H: p->f++; I: … J: …

  30. Lets look at the complicated bug void AddToCache() { // ... A: x &= ~(FLAG_NOT_DELETED); B: x |= FLAG_CACHED; MemoryBarrier(); // ... } AddToCache(); assert( x & FLAG_CACHED );

  31. The bit operations are not atomic void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

  32. The bug void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED ); void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

  33. Cuzz Guarantee • Given a program that creates at most n threads and executes at most k instructions • Cuzz finds every bug of depth d with probability in every run of the program

  34. A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/n • n: no. of threads (~ tens) Possible schedules A B C D E F G H I J  A BF G HC D E I J  A BF GC D EH I J  A BF GCHD EI J A BF G H I J C D E  … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …

  35. A Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/nk • n: no. of threads (~ tens) • k: no. of instructions (~ millions) Possible schedules A B C D EF G H I J  A B CD EH I J F G A B C H I D E G J A B C D HEFI J G A B CH D EI J F G  … Parent Child A: … B: p = malloc(); C: fork (child); D: …. E: if (p != NULL) F: p->f ++; G: H: … I: p = NULL; J : ….

  36. Another Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/nk • n: no. of threads (~ tens) • k: no. of instructions (~ millions) Parent Child A: … B: Lock (A); C: … D: Lock (B); E: … F: … G: Lock (B); H: … I: Lock (A); J: …

  37. Cuzz Algorithm Inputs: n: estimated bound on the number of threads k: estimated bound on the number of steps d: target bug depth // 1. assign random priorities >= d to threads for t in [1…n] do priority[t] = rand() + d; // 2. chose d-1 lowering points at random for i in [1...d) do lowering[i] = rand() % k; steps = 0; while (some thread enabled) { // 3. Honor thread priorities Let t be the highest-priority enabled thread; schedule t for one step; steps ++; // 4. At the ith lowering point, set the priority to i if steps == lowering[i] for some i priority[t] = i; }

  38. A Bug of Depth 1 • Found when child has a higher probability than the parent (prob = ½) Parent Pri = 1 Child Pri = 2 fork (child); p = malloc(); fork (child); do_init(); p->f ++; p = malloc();

  39. A Bug of Depth 2 • Found when the parent starts with a higher probability and a lowering point is inserted after the branch condition (prob= 1/2*5 = 1/10) Parent Pri = 3 Child Pri = 2 p = malloc(); fork (child); if (p != NULL) p->f ++; p = malloc(); fork (child); if (p != NULL) p = NULL; Lowering Point Pri = 1 p->f ++;

  40. In Practice, CuzzBeats its Bound • Cuzz performs far greater than the theoretical bound • The worst-case bound is based on a conservative analysis • We employ various optimizations • Programs have LOTS of bugs • Probability of finding any of the bug is (roughly) the sum of the probability of finding each • The buggy code is executed LOTS of times

  41. For Some of our Benchmarks • Probability increases with n, stays the same with k • In contrast, worst-case bound = 1/nkd-1

  42. Dimension Theory • Any partial-order G can be expressed as an intersection of a set of total orders • This set is called a realizer of G a a b c d e = b d a d b e c c e

  43. Property of Realizers • For any unordered pair a and b, a realizer contains two total orders that satisfy (a,b) and (b,a) a a b c d e = b d a d b e c c e

  44. Dimension of a Partial Order • Dimension of Gis the size of the smallest realizer of G • Dimension is 2 for this example a a b c d e = b d a d b e c c e

  45. Why is it called “dimension” • You can encode a partial-order of dimension d as points in a d-dimensional space c a e = b d b c e d a a b c d e a d b e c

  46. Why is it relevant for us • P = Set of all partial orders, B = Set of all bugs of depth 1 • If you can uniformly sample the smallest realizer of a partial order p • Probability of any bug of depth 1 >= 1/dimension(p) a a b c d e = b d a d b e c c e

  47. All this is good, but • Finding the dimension of a partial order in NP complete • Real programs are not static partial-orders

  48. Width of a Partial-Order • Width of a partial-order G is the minimum number of total orders needed to cover G • Width corresponds to the number of “threads” in G • For all G, Dimension(G) <= Width(G) a a b d b d is covered by c e c e

  49. Cuzz Algorithm • Cuzz is an online randomized algorithm for uniformly sampling a realizer of size Width(G) • Assign random priorities to “threads” and topologically sort based on the priorities a a b c d e = b d a d b e c c e

  50. Extension to Larger Depths • Note: a realizer of G covers all possible orderings of an unordered pair • We define a d-realizer of G as a set of total orders that covers all possible orderings of d unordered pairs • d-dimension of G is the size of the smallest d-realizer of G • Theorem • d-Dimension(G) <= Dimension(G) . kd-1 • where k is the number of nodes in G • Cuzz is an online algorithm for uniformly sampling over a d-realizer of G

More Related