1 / 33

Lazy Diagnosis of In-Production Concurrency Bugs

Lazy Diagnosis of In-Production Concurrency Bugs. Baris Kasikci, Weidong Cui, Xinyang Ge, Ben Niu. Why Does In-Production Bug Diagnosis Matter?. Potential to fix bugs that impact users Short release cycles make in-house testing challenging

Download Presentation

Lazy Diagnosis of In-Production Concurrency Bugs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lazy Diagnosis of In-Production Concurrency Bugs Baris Kasikci, Weidong Cui, Xinyang Ge, Ben Niu

  2. Why Does In-Production Bug Diagnosis Matter? • Potential to fix bugs that impact users • Short release cycles make in-house testing challenging • Release cycles can be as frequent as a few times a day1 [1] https://code.facebook.com/posts/270314900139291/rapid-release-at-massive-scale

  3. Concurrency Bug Diagnosis Atomicity Violation Thread 2 Thread 1 Thread 1 Thread 2 R Time if (*x) { y = *x; } free(x); x = NULL; W R Concurrency bug diagnosis requires knowing the orderof key events (e.g., memory accesses)

  4. Challenges of Concurrency Bug Diagnosis • Diagnosis requires reproducing bugs [PBI, ASPLOS’13] [Gist, SOSP’15] • Practitioners report that they can fix reproducible bugs [PLATEAU’14] • It may not be possible to reproduce in-production concurrency bugs • Inputs for reproducing bugs may not be available • Exposing bugs in production may incur high overhead [RaceMob, SOSP’13]

  5. Record/Replay Atomicity Violation • Tracing fine-grained interleavings incurs high overhead • State-of-the-art record/replay has 28% overhead [DoublePlay, ASPLOS’11] Thread 2 Thread 1 Time R ΔT1 W ΔT2 R In theory, ΔT can be on the order of a nanosecond

  6. Coarse Interleaving Hypothesis Atomicity Violation • Study with 54 bugs in 13 systems • Smallest ΔT is 91 microseconds Thread 2 Thread 1 Time R ΔT1 91 us W 10^5 ~ ~1ns ΔT2 R A lightweight, coarse-grained time tracking mechanism can help infer ordering

  7. Lazy Diagnosis Leverages the coarse interleaving hypothesis Hybrid dynamic/static root cause diagnosis technique • Snorlax • Lazy Diagnosis Prototype • Fully Accurate Concurrency Bug Diagnosis (11 bugs in 7 systems) • Low overhead (always below < 2%)

  8. Outline • Usage model • Design • Evaluation

  9. Current Bug Diagnosis Model Root cause diagnosis

  10. Lazy Diagnosis Usage Model Lazy Diagnosis Root cause + Root cause diagnosis Control- flow trace & Timing Info Control flow trace speeds up static analysis Coarse-grained timing information helps determine ordering

  11. Outline • Usage model • Design • Evaluation

  12. Lazy Diagnosis Statistical Diagnosis Hybrid Points-to Analysis Bug Pattern Computation Type-based Ranking

  13. Lazy Diagnosis Statistical Diagnosis Hybrid Points-to Analysis Bug Pattern Computation Type-based Ranking

  14. Hybrid Points-to Analysis I1 Hybrid Points-to Analysis FAILURE (CRASH) store i32* %21, %bufSize store %Queue* %1, %q IF I2 load %Queue*, %fifo Finds instructions with operands pointing to the same location as the failing instruction’s operand

  15. Hybrid Points-To Analysis • Uses the control flow traces to limit the scope of static analysis • Runs fast, scales to large programs (e.g., httpd, MySQL) • Lazy • Control flow traces trigger the analysis • Interprocedural • Bug patterns may span multiple functions • Flow-insensitive • Discards execution order of instructions for scalability

  16. Lazy Diagnosis Statistical Diagnosis Hybrid Points-to Analysis Bug Pattern Computation Type-based Ranking

  17. Lazy Diagnosis Statistical Diagnosis Hybrid Points-to Analysis Bug Pattern Computation Type-based Ranking

  18. Type-Based Ranking FAILURE (CRASH) load %Queue*, %fifo 1 Type-based Ranking store i32* %21, %bufSize store i32* %21, %bufSize 2 store %Queue* %1, %q store %Queue* %1, %q Highly ranks instructions operating on types that match the failing instruction's operand type

  19. Lazy Diagnosis Statistical Diagnosis Hybrid Points-to Analysis Bug Pattern Computation Type-based Ranking

  20. Lazy Diagnosis Statistical Diagnosis Hybrid Points-to Analysis Bug Pattern Computation Type-based Ranking

  21. Bug Pattern Computation Thread 1 Thread 1 Thread 2 Thread 2 Bug Pattern I Bug Pattern II Bug Pattern Computation Bug Pattern Computation load %Queue*, %fifo load %Queue*, %fifo load %Queue*, %fifo FAILURE store %Queue* %1, %q store %Queue* %1, %q store i32* %21, %bufSize store i32* %21, %bufSize

  22. Bug Pattern Computation • Our implementation uses timing packets in Intel Processor Trace • Granularity of a few 10s of microseconds • We measured the smallest ΔT between key events as 91 microseconds Leverages the coarse interleaving hypothesis to establish instruction orders

  23. Lazy Diagnosis Statistical Diagnosis Hybrid Points-to Analysis Bug Pattern Computation Type-based Ranking

  24. Lazy Diagnosis Statistical Diagnosis Hybrid Points-to Analysis Bug Pattern Computation Type-based Ranking

  25. store %Queue* %1, %q store %Queue* %1, %q load %Queue*, %fifo load %Queue*, %fifo load %Queue*, %fifo load %Queue*, %fifo Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 load %Queue*, %fifo load %Queue*, %fifo Thread 2 Thread 2 Thread 2 Thread 2 Thread 2 Thread 2 store %Queue* %1, %q store %Queue* %1, %q store %Queue* %1, %q store %Queue* %1, %q FAILURE (CRASH) SUCCESS SUCCESS SUCCESS SUCCESS FAILURE (CRASH) Statistical identification of failure predicting patterns

  26. Outline • Usage model • Design • Evaluation

  27. Evaluation of Snorlax • Is Snorlax effective? • Is Snorlax accurate? • Is Snorlax efficient? • How does Snorlax compare to its competition?

  28. Experimental Setup • Real-world C/C++ programs • 11 concurrency bugs • Workloads from program’s test cases and test cases by other researchers

  29. Snorlax’s Effectiveness • Snorlax correctly identified the root causes of 11 bugs • Determined after manual investigation of developer fixes • A single failure recurrence is enough for root cause diagnosis • In practice, for concurrency bugs, “event orders” = “root cause” Snorlax can effectively diagnose concurrency bugs

  30. Snorlax’s Accuracy Accuracy Contribution All stages of Lazy Diagnosis are necessary for full accuracy

  31. Snorlax’s Efficiency Percentage Overhead 0.97% Snorlax has low runtime performance overhead (always below 2%)

  32. Snorlax vs. Gist 39% Percentage Overhead 3% 1.9% 0.9% Snorlax scales better than Gist with the increasing number of application threads

  33. Lazy Diagnosis Leverages the coarse interleaving hypothesis Hybrid dynamic/static root cause diagnosis technique • Snorlax • Lazy Diagnosis Prototype • Fully Accurate Concurrency Bug Diagnosis (11 bugs in 7 systems) • Low overhead (always below < 2%) • Scales well with the increasing number of threads Michigan is hiring!

More Related