1 / 46

C ooperative C oncurrency Bug I solation

Instrumentation and Sampling Strategies for. C ooperative C oncurrency Bug I solation. Guoliang Jin, Aditya Thakur, Ben Liblit , Shan Lu University of Wisconsin–Madison. Cooperative Concurrency Bug Isolation. They are synchronization mistakes in multi-threaded programs.

vanig
Download Presentation

C ooperative C oncurrency Bug I solation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instrumentation and Sampling Strategiesfor Cooperative Concurrency Bug Isolation Guoliang Jin,Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison

  2. Cooperative Concurrency Bug Isolation • They are synchronization mistakes in multi-threaded programs. • Several types: • Atomicity violation • Data race • Deadlock, etc. thread 1 thread 2 thread 1 thread 2 read(x) write(x) read(x) read(x) write(x) J? J? J L

  3. Concurrency bugs are common in the fields • Developers are poor at parallel programming • Interleaving testing is inefficient • Applications with concurrency bugs shipped to the users  ‚ ƒ € 

  4. Concurrency bug lead to failures in the field • Disasters in the past • Therac-25, Northeastern Blackout 2003 • More threats in multi-core era ‚

  5. Failure diagnosis is critical

  6. Concurrency Bug Failure Example L Concurrency Bug from Apache HTTP Server

  7. Concurrency Bug Failure Example thread 1 thread 2 … log_writer() { idx … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

  8. Concurrency Bug Failure Example thread 1 thread 2 … log_writer() { … log_writer() { idx … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

  9. Diagnosing Concurrency Bug Failure is Challenging • The failure is non-deterministic and rare • Programmers have trouble to repeat the failure • The root cause involves more than one thread

  10. Existing work and their limitations • Failure replay • High runtime overhead • Developers need to manually locate faults • Run-time bug detection • (mostly) High runtime overhead • Not guided by the failure • Many false positives How to achieve low-overhead & accurate failure diagnosis?

  11. Our work: CCI • Goal: diagnosing production run concurrency bug failures • Major components: • predicates instrumentor • sampler • statistical debugging Predicates ProgramSource True in most failure runs, false in most correct runs.  Sampler ƒ ‚ ƒ Compiler €  StatisticalDebugging Counts& J/L Predictors

  12. CCI Overview • Three different types of predicates. • Each predicate has its supporting sampling strategy. • Same statistical debugging as in CBI. • Experiments show CCI is effective in diagnosing concurrency failures. Prev Havoc FunRe

  13. Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion

  14. CCI-PrevIntuition Data Race Atomicity Violation thread 1 thread 2 thread 1 thread 2 thread 1 thread 2 thread 1 thread 2 read(x) read(x) read(x) read(x) read(x) write(x) write(x) read(x) read(x) write(x) write(x) read(x) read(x) write(x) write(x) read(x) read(x) J L J L Just record which thread accessed last time.

  15. CCI-PrevPredicate It tracks whether two successive accesses to a shared memory location were by two distinct threads or were by the same thread.

  16. CCI-Prev Predicate on the Correct Run thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J I … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

  17. CCI-Prev Predicate on the Failure Run thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

  18. CCI-PrevPredicate Instrumentation thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … … log_writer() { … } … lock(glock); a globalhash table remote = test_and_insert(&idx, curTid); record(I, remote); temp = idx; idx= temp + strlen(s); I unlock(glock); L … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

  19. CCI-PrevSampling Strategy • Thread-coordinated • Bursty thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I • Does traditional sampling work? • NO. … return SUCCESS; … } …

  20. Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion

  21. CCI-Havoc Intuition thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I … return SUCCESS; Just record what value was observed during last access. … } …

  22. CCI-HavocPredicate It tracks whether the value of a given shared location changes between two consecutive accesses by one thread. Only uses thread local information

  23. CCI-Havoc Predicate on the Correct Run thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); J I … return SUCCESS; … } … Concurrency Bug from Apache HTTP Server

  24. CCI-Havoc Predicate on the Failure Run thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); I … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); I … return SUCCESS; L … } … Concurrency Bug from Apache HTTP Server

  25. CCI-Havoc Predicate Instrumentation thread 1 thread 2 … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … … temp = idx; idx= temp + strlen(s); I hash table for thread1 changed = test(&idx, temp); record(I, changed); insert (&idx, temp); … return SUCCESS; L … } … Concurrency Bug from Apache HTTP Server

  26. CCI-Havoc Sampling Strategy • Bursty • Thread-independent thread 1 thread 2 … log_writer() { … log_writer() { … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); … return SUCCESS; … } …

  27. Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion

  28. CCI-FunRePredicate It tracks whether the execution of one function overlaps with the execution of the same function from a different thread.

  29. CCI-FunRePredicate Example thread 1 thread 2 thread 1 thread 2 … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … … log_writer() { … return SUCCESS; } … J L

  30. CCI-FunRePredicate Instrumentation thread 1 thread 2 … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … L

  31. CCI-FunReSampling Strategy thread 1 thread 2 … log_writer() { … return SUCCESS; } … … log_writer() { oldCount= atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … L Function execution accounting is not suitable for sampling, so this part is unconditional.

  32. CCI-FunReSampling Strategy • Function execution accounting: • unconditional • FunRe predicate recording: • thread-independent • non-bursty

  33. Outline • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion • Motivation • CCI Overview • CCI Predicates and Sampling Strategies • CCI-Prev and its sampling strategy • CCI-Havoc and its sampling strategy • CCI-FunRe and its sampling strategy • Evaluation • Conclusion

  34. Experimental Evaluation • Implementation • Static instrumentor based on the CBI framework • Real world concurrency bug failure from: • Apache HTTP server, Cherokee • Mozilla-JS, PBZIP2 • SPLASH-2: FFT, LU • Parameter used • Roughly 1/100 sampling rate

  35. Failure Diagnosis Evaluation • Methodology • Using concurrency bug failures occurred in real-world • Each app. runs 3000 times on a multi-core machine • Add random sleep to get some failure runs • Sampling is enabled • Statistical debugging then return a list of predictors • Which predictor in the list can diagnose failure?

  36. Failure Diagnosis Results (with sampling) FunRe Havoc Prev Capability

  37. Runtime Overhead FunRe Havoc Prev Overhead

  38. Conclusion • CCI is capable and suitable to diagnose many production-run concurrency bug failures. • Future predicates can leverage our effective sampling strategies. • Experiments confirm design tradeoff.

  39. CCI Questions about ?

  40. CCI Questions about ?

  41. CBI on Concurrency Bug Failures thread 1 thread 2 … log_writer() { … log_writer() { To diagnose production-run concurrency bug failures, interleaving related events should be tracked!!! idx … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … temp = idx; idx= temp + strlen(s); CBI does not work! … return SUCCESS; … } … … temp = idx; idx= temp + strlen(s); L … return SUCCESS; Concurrency Bug from Apache HTTP Server … } …

  42. CCI-PrevPredicate Instrumentation with Sampling if (gsample) { lock(glock); changed = test_and_insert(&cnt, curTid); record(I, changed); temp = cnt; unlock(glock); } else { temp = cnt; } [[ gsample = true; iset = curTid; lLength=gLength=0;]]?

  43. CCI-PrevPredicate Instrumentation with Sampling if (gsample) { lock(glock); changed = test_and_insert(&cnt, curTid, &stale); changed = test_and_insert(&cnt, curTid); record(stale ? P1 : P2, changed); record(I, changed); temp = cnt; unlock(glock); gLength++; lLength++; if(( iset == curTid && lLength > lMAX) || gLength > gMAX) { clear (); iset= unusedTid; gsample= false; } } else { temp = cnt; [[ gsample = true; iset = curTid; lLength=gLength=0;]]? }

  44. CCI-Havoc Predicate Instrumentation with Sampling if (sample) { changed = test(&cnt, cnt, &stale); record(stale ? P1 : P2, changed); temp = cnt; insert (&cnt, cnt); length++; if(length > lMAX) { clear (); sample = false; } No global lock used!!! } else { temp = cnt; [[ sample = true; length=0;]]? }

  45. Failure Diagnosis Results (with sampling) FunRe Havoc Prev Capability

  46. Failure diagnosis is critical

More Related