1 / 46

Fuzzy Set and Cache-based Approach for Bug Triaging

Fuzzy Set and Cache-based Approach for Bug Triaging. Ahmed Y. Tamrawi. Electrical and Computer Engineering Department Iowa State University 2011. Software Bugs. 2. 3. 4. 5. 1. { Introduction }. Bugs can occur in any software.

olympe
Download Presentation

Fuzzy Set and Cache-based Approach for Bug Triaging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fuzzy Set and Cache-based Approach for Bug Triaging Ahmed Y. Tamrawi Electrical and Computer Engineering Department Iowa State University 2011

  2. Software Bugs 2 3 4 5 1 { Introduction } • Bugs can occur in any software. • Ranging from operating systems, flight auto-pilot software, to a simple arithmetic program! • Software bugs are costing ~60 bln US$/Y. The term “Bug” Definition: (Software Bug) A common term used to describe a flaw, mistake, or failure in a computer system that produces an incorrect or unexpected result, or causes it to behave in unintended ways. (September 9, 1947) Fuzzy Set and Cache-based Approach for Bug Triaging

  3. More Bugs 2 3 4 5 1 { Introduction } Fuzzy Set and Cache-based Approach for Bug Triaging

  4. Bug Repository 2 3 4 5 1 { Introduction } • Software users and developers report bugs, to allow software developers to fix them. • Bugs are reported using bug reports which are added to an issue tracking system or bug repository. Bugs Repository An interface for Bugs Repository reported stored Fuzzy Set and Cache-based Approach for Bug Triaging

  5. Bug Triaging 2 3 4 5 1 { Introduction } • Manual bug triaging is a difficult, expensive, and lengthy process, since it needs the bug triagerto manually read, analyze, and assign bug fixers for each newly reported bug. Definition: (Bug Triaging) Assigning a bug to the most appropriate/capable developer who will fix it. Fuzzy Set and Cache-based Approach for Bug Triaging

  6. Bug Triaging 2 3 4 5 1 { Introduction } Bug Assignment Bug Triager              Software Developers     Bugs Repository     New Bug Reports Fuzzy Set and Cache-based Approach for Bug Triaging

  7. Bug Triaging 2 3 4 5 1 { Introduction } • Bug triager challenges: • Knowledge about the system/project; • Descriptiveness of bug report; • Rate of reporting bugs; • Many developers, different projects, and various expertise! • Why not to automate the bug triaging process? • Improve software quality; • Reduce cost and time. Eclipse – Feb 2011 Fuzzy Set and Cache-based Approach for Bug Triaging

  8. Example Assigned to: James Moody Summary: New Repository wizard follows implementation model, not user model. Description: The new CVS Repository Connection wizard's layout is confusing. This is because it follows the implementation model of the order of elds in the full CVS location path rather than the user model... Assigned to: James Moody Summary: Opening repository resources doesn't honor type. Description:Opening repository resource always open the default text editor and doesn't honor any mapping between resource types and editors. As a result it is not possible to view the contents of an image (*.gif le) in a sensible way.... 2 3 4 5 1 { Motivation } Technical Aspect Version Control Management (VCM) This aspect is concerned about various  Concurrent Versions System (CVS) repository features and operations within Eclipse project. James Moody Fuzzy Set and Cache-based Approach for Bug Triaging

  9. Technical Aspects & Terms • A software system has many technical aspects. • Technical aspects are described via the technical terms extracted from software artifacts. • A bug report describes issues related to technical aspects via its terms. 2 3 4 5 1 { Motivation } Fuzzy Set and Cache-based Approach for Bug Triaging

  10. Automatic Bug Triaging 2 3 4 5 1 { Motivation } Key Philosophy for Automatic Bug Triaging Who have the most bug-fixing capability/expertise with respect to the reported technical aspect(s) in a give bug report should be the fixer(s) Fuzzy Set and Cache-based Approach for Bug Triaging

  11. Problem Definition 2 3 4 5 1 {Bugzie Model } Problem: (Automatic Bug Assignment) In a software system, given a bug report B, and a set of developers D who have past fixing activity. Find the developers(s) with the most fixing expertise with respect to the reported technical aspect(s) in B.          Software Developers          New Bug Report B Bugs Repository Fuzzy Set and Cache-based Approach for Bug Triaging

  12. Bugzie Overview • Bugzie considers the problem as a ranking problem. • State-of-the-art approaches view the problem as a classification problem. • For a bug report, Bugziedetermines a ranked list of developers most capable toward the reported issue(s). 2 3 4 5 1 {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging

  13. Bugzie Overview • Bugzie utilizes the fuzzy set theory to rank the fixing expertise of developers toward the technical aspects. • Bugzie models the association of a developer and technical aspects. • If a developer has higher fixing association with a technical aspect, he will have higher expertise and rank for that aspect. 2 3 4 5 1 {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging

  14. Association of Fixer & Term •  is more capable than  in the issues related to t. 2 3 4 5 1 {Bugzie Model } Definition: (Capable Fixer toward a Term) For a technical term t, a fuzzy set Ct, with associated membership function , represents the set of developers who have the bug-fixing expertise relevant to technical aspects(s) described by t Ct   Fuzzy Set and Cache-based Approach for Bug Triaging

  15. Association of Fixer & Term D( )  D( )  • The membership score of a developer d toward a term t is: • Dd: Bug reports d has fixed. • Dt: Bug reports containing t. 2 3 4 5 1 {Bugzie Model }  D( )  D( ) Fuzzy Set and Cache-based Approach for Bug Triaging

  16. Association of Fixer & Bug Report Bug Report (B) t1 t2 tn 2 3 4 5 1 {Bugzie Model } CB Fuzzy Set and Cache-based Approach for Bug Triaging

  17. Association of Fixer & Bug Report • In fuzzy set, unionis a flexible combination. • The strong membership to a sub-fuzzy set(s) implies the strong membership to the combined fuzzy set. • After calculating for the developers, Bugzierecommends the top-scored ones as fixers for the bug report. 2 3 4 5 1 {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging

  18. BugzieModel 2 Bug Report (B) Bug Report (B) 3 Pre-processing t1 t2 tn 2 3 4 5 1 {Bugzie Model }  Recommendation   1 4 Recommendation List Descending on  Bugs Repository  5 Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging

  19. Bugzie Caching • Fixer candidates selection (Developers Caching). • Significant terms selection (Terms Caching). 2 3 4 5 1 {Bugzie Model } Bugs Repository Terms Cache T(k) Developers Cache F(x) Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging

  20. Data Collection • Collected all fixed bug reports from 7 bug repositories. • For each bug report, we extracted and merged the summary and description. • For each system, we pre-processed these reports: stemming, stop words removal, etc. 2 3 4 5 1 {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging

  21. Locality of Fixing Activity 2 3 4 5 1 {Bugzie Model } Bug Report 2006 2007 2008 2009 2010 Timeline Fuzzy Set and Cache-based Approach for Bug Triaging

  22. Locality of Fixing Activity • If d belongs to the F(x), we count this as a hit. 2 3 4 5 1 {Bugzie Model } Bug Report B Fixed by d All Developers that have been fixing before B Hypothesis: (Locality of Fixing Activity) The recent fixing developers are likely to fix bug reports in the near future. Fixing Timeline Recent x% 2006 2007 2008 2009 2010 Developers Cache F(x) Fuzzy Set and Cache-based Approach for Bug Triaging

  23. Locality of Fixing Activity 96% - 99% 2 3 4 5 1 94% - 98% {Bugzie Model } Fuzzy Set and Cache-based Approach for Bug Triaging

  24. Selection of Fixer Candidates • The locality of fixing activity suggests the actual fixer for a given bug report is likely the one having recent fixing activity. • For each bug report, Bugziechooses the top x% of developers sorted by their fixing time as the fixer candidates F(x). 2 3 4 5 1 {Bugzie Model } Bug Report B Fixed by d All Developers that have been fixing before B Fixing Timeline Recent x% 2006 2007 2008 2009 2010 Developers Cache F(x) Fuzzy Set and Cache-based Approach for Bug Triaging

  25. Developers Caching 3 Bug Report (B) 4 Pre-processing t1 t2 tn 2 3 4 5 1 {Bugzie Model }  Recommendation   5 Recommendation List 2 Descending on  1 Bugs Repository  Developers Cache F(x) Bug Report (B) 6 Updating Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging

  26. Selection of Descriptive Terms 2 3 4 5 1 {Bugzie Model } RECALL : For a developer dand a term t, the higher their association score , the higher significance of t in describing the technical aspects that d has fixing expertise. Fuzzy Set and Cache-based Approach for Bug Triaging

  27. Selection of Descriptive Terms Descending on  2 3 4 5 1 {Bugzie Model }      (All Terms) Fuzzy Set and Cache-based Approach for Bug Triaging

  28. Terms Caching Bug Report (B) Pre-processing t1 t2 tn 2 3 4 5 1 {Bugzie Model }  Recommendation   Recommendation List Descending on  Bugs Repository Terms Cache T(k)  Bug Report (B) Updating Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging

  29. Empirical Evaluation • We evaluated Bugzieon our collected datasets. • Experiments: • Selection of fixer candidates; • Selection of terms; • Selection of developers and terms; • Comparison with state-of-the-art approaches. 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging

  30. Experiment Setup 0 1 2 3 4 5 6 7 8 9 10 Creation Timeline 2 3 4 5 1 { Empirical Evaluation } Bug Report B Bug Report B Bugzieuses frame 0 for initial training 1   Using training data,Bugzierecommends a top-n developers to fix bug report B  2 Recommendation List for B Descending on Move to next Bug Report Bugzieupdates the training data with the tested bug report B  3 Bugzierepeats steps 2 and 3 till it consumes all bug reports Fuzzy Set and Cache-based Approach for Bug Triaging

  31. Prediction Accuracy • If the recommendation list for a bug report contains its actual fixer, we count this as a hit(i.e. a correct recommendation). • For each frame under test, we calculated Prediction Accuracy (PA). • If we have 100 bugsand for 60 of those bugs, we could recommend the actual fixing developer is in our Top-2 list, then Top-2 prediction accuracy is 60%. 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging

  32. Selection of Fixer Candidates 3 Bug Report (B) 4 Pre-processing t1 t2 tn 2 3 4 5 1 Bug Report B Fixed by d All Developers that have been fixing before B { Empirical Evaluation }  Recommendation Fixing Timeline  Recent x% 2006 2007 2008 2009 2010  5 Recommendation List 2 Descending on Developers Cache F(x)  1 Bugs Repository  Developers Cache F(x) Bug Report (B) 6 Updating Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging

  33. Selection of Fixer Candidates 2 3 4 5 1 { Empirical Evaluation } Firefox ( ): At x = 10%, PA = 72.4% At x = 100%, PA = 70.7% Top-1 Prediction Accuracy Top-5 Prediction Accuracy Fuzzy Set and Cache-based Approach for Bug Triaging

  34. Selection of Fixer Candidates • Selecting a suitable portion of recent fixers does not lessen much the accuracy, and sometimes improves it as in the cases of Firefox, Eclipse, etc. • Selecting only a portion of available developers as candidates also improves time efficiency. 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging

  35. Selection of Terms 3 Bug Report (B) 4 Pre-processing t1 t2 tn 2 3 4 5 1 { Empirical Evaluation }  Recommendation  2  5 Recommendation List Descending on  1 Bugs Repository Terms Cache T(k)  Bug Report (B) 6 Updating Updating Initial Training Fuzzy Set and Cache-based Approach for Bug Triaging

  36. Selection of Terms 2 3 4 5 1 { Empirical Evaluation } Eclipse( ): At k = 16, PA = 80% At k = All Terms, PA = 72% Peak Range Peak Range Top-1 Prediction Accuracy Top-5 Prediction Accuracy Fuzzy Set and Cache-based Approach for Bug Triaging

  37. Selection of Terms • Selection of terms could improve much the prediction accuracy. • The results suggest that one just needs a small yet significant set of terms for each developer to describe his bug-fixing expertise. 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging

  38. Selection of Developers & Terms • To study the impact of both developers selection (x) and terms selection (k). 2 3 4 5 1 { Empirical Evaluation } Eclipse Firefox Fuzzy Set and Cache-based Approach for Bug Triaging

  39. Selection of Developers & Terms 2 3 4 5 1 { Empirical Evaluation } Base: Base model with all developers and all terms C.S.: Candidate Selection T.S.: Terms Selection Both: The best PA when applying both C.S. and T.S. Fuzzy Set and Cache-based Approach for Bug Triaging

  40. Comparison • We compared Bugzie Results with state-of-the-art approaches. • Used Wekato re-implement those approaches 2 3 4 5 1 { Empirical Evaluation } Fuzzy Set and Cache-based Approach for Bug Triaging

  41. Comparison • Some of the approaches (C4.5 - Decision Trees) can not scale up well to our dataset. • We prepared smaller dataset: 2 3 4 5 1 { Empirical Evaluation } 3-Year Histories of the full dataset Fuzzy Set and Cache-based Approach for Bug Triaging

  42. Comparison Results 2 3 4 5 1 { Empirical Evaluation } (d) days, (h) hours, (m) minutes, (s) seconds Fuzzy Set and Cache-based Approach for Bug Triaging

  43. Conclusions • Bugzie achieves higher accuracy and efficiency than state-of-the-art approaches. • Bugziecan accommodate the locality of fixing activity and software evolution with flexible caching of developers and terms. 2 3 4 5 1 { Conclusions} Fuzzy Set and Cache-based Approach for Bug Triaging

  44. Thesis Contributions • Bugzie, a scalable, fuzzy set and cache-based automatic bug triaging approach, which is significantly more efficient and accurate than existing state-of-the-art approaches. • The finding of the locality of fixing activity. • A comprehensive evaluation on the efficiency and correctness of Bugziein comparison with state-of-the-art approaches. • An observation/method to capture a small and significant set of terms describing developers’ bug-fixing expertise. 2 3 4 5 1 { Conclusions} Fuzzy Set and Cache-based Approach for Bug Triaging

  45. Future Work • Use different caching mechanisms for developers and terms. • Explore the usage of other textual and non-textual contents of bug reports for bug triaging. • Use other software artifacts to accurately measure the developer’s expertise. 2 3 4 5 1 { Conclusions} Fuzzy Set and Cache-based Approach for Bug Triaging

  46. Thank You! Fuzzy Set and Cache-based Approach for Bug Triaging

More Related