1 / 26

Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics

Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics. Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared in ASPLOS’08. Presented by Michelle Goodstein LBA Reading Group 3/27/08. Introduction. Multi-core computers are common

Download Presentation

Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, EunsooSeo and Yuanyuan Zhou Appeared in ASPLOS’08 Presented by Michelle Goodstein LBA Reading Group 3/27/08

  2. Introduction • Multi-core computers are common • More programmers are having to write concurrent programs • Concurrent programs have different bugs than sequential programs • However, without a study, hard to know what those bugs are • First real-world study of concurrency bugs

  3. Introduction • Knowing the types of concurrent bugs that actually occur in software will: • Help create better bug detection schemes • Inform the testing process software goes through • Provide information to program language designers

  4. Introduction • Current state of affairs • Repeating concurrent bugs is difficult • Test cases are critical to being able to diagnose a bug • Most detection research focuses: • data races • deadlock bugs • some new work on detecting atomicity violations • Few studies on real world concurrency bugs • Most use programs that were buggy by design for the study • Most studies on bug characteristics focus on non-concurrent bugs

  5. Methodology • 4 representative open-source applications: • MySQL • Apache • Mozilla • OpenOffice • Each application has • 9-13 years of development history • 1-4 million lines of code

  6. Methodology • Randomly selected bugs from bug databases that contained at least one keyword related to concurrency (eg “race”, “concurrency”, “deadlock”, “synchronization”, etc.) • From these, randomly choose 500 bugs that have • Root causes explained well and in detail • Source code available • Bug fix info available

  7. Methodology • Remove any bugs not truly caused by concurrency • Result: 105 concurrency bugs • Separate study of deadlock and non-deadlock bugs

  8. Methodology • Evaluated bugs in 3 dimensions • Bug pattern: {atomicity-violation, order-violation, other} • Manifestation: required conditions for bug to occur, # threads involved, # variables, # accesses • Bug fix strategy: Look at final patch, mistakes in intermediate patches, and whether TM can help • Results organized as a collection of findings

  9. Motivation • 34/105 concurrency bugs cause program crashes • 37/105 concurrency bugs cause programs to hang • Concurrency bugs are important

  10. Bug Patterns

  11. Findings: Bug Patterns • Atomicity Violation • Order Violation

  12. Findings: Bug Patterns • Most (72/74) of the examined non-deadlock concurrency bugs are either atomicity-violations or order-violations • Focusing on atomicity and order-violations should detect most non-deadlock concurrency bugs • In fact, 24/74 are order violations • Since current tools don’t address order-violation, new tools must be developed

  13. Bug Manifestations

  14. Findings: Bug Manifestations • Most (101/105) bugs involved ≤ 2 threads • Most communication among a small number of threads • Enforcing certain partial orderings among a small number of threads can expose bugs • Heavy workloads can increase competition for resources, and make it more likely to observe a partial ordering that causes a bug • Pairwise Testing can find many bugs

  15. Findings: Bug Manifestations • Some (7/31) bugs experience deadlock bugs with only 1 thread! • Easy to detect/avoid

  16. Findings: Bug Manifestations • Many (49/74) non-deadlock bugs involve 1 variable. However, 34% involve ≥ 2 variables • Focusing on 1 variable is a good simplification • However, new tools also necessary to discover multivariable concurrency bugs

  17. Findings: Bug Manifestations • Most (30/31 ) deadlock bugs involved ≤ 2 resources • Pairwise testing of order among obtained and released resources should help reveal deadlocks

  18. Findings: Bug Manifestations • Most (92%) bugs manifested if enforced certain partial orderings among ≤ 4 memory accesses • Testing small groups of accesses will be polynomial time and expose most bugs

  19. Bug Fixes

  20. Findings: Bug Fixes • Adding/changing locks only helps minority (20/74) non-deadlock concurrency bug fixes • Locks aren’t enough to fix all concurrency bugs. • Locks don’t promise ordering, just atomicity • Addition of locks can hurt performance or create new, deadlock bugs

  21. Findings: Bug Fixes • Most common fix (19/31) to deadlock bugs allows 1 thread to ignore acquiring a resource, like a lock • This may get rid of deadlock bugs, but create other non-deadlock bugs • Code may no longer be correct

  22. Bug fixes: Buggy Patches • 17/57 Mozilla bugs have ≥ 1 buggy patch • On average, release .4 buggy patches for every final correct patch • Of 23 distinct buggy patches for the 17 bugs: • 6 decrease probability of occurrence but do not eliminate original bug • 5 create new concurrency bugs • 12 create new non-concurrency bugs

  23. Findings: Bug fixes • In many (41/105) cases, TM can help avoid concurrency bugs

  24. Findings: Bug fixes • Also in many cases (44/105), TM might be able to help with concurrency bugs • Need to allow long regions, rollback of I/O, strange “nature” of the code

  25. Findings: Bug fixes • In 20/105 cases, TM provides little help • TM cannot help with many order-violation bugs • While TM could be useful in preventing concurrency bugs, it will not fix all of them

  26. Conclusion • First real-world concurrent bug study • Multiple findings on • Type of concurrency bugs • Conditions for manifestation • Techniques for fixing concurrent bugs • Several heuristics proposed for: • Bug detection • Testing • Language Design (ie, TM) • Future work can focus on detecting common types of errors • Multi-variable bugs • Order violation bugs • Multiple-access bugs

More Related