Automatic Identification of Bug-Introducing Changes. Presenter: Haroon Malik
Abstract • Bug-fixes do not contain information about the change that initially introduced the bug. • Extraction of bug-introducing changes is challenging. • An algorithm to automatically and accurately identify a bug-introducing changes. • Algorithm can remove 30%~51% of false positive and 14%~15% of false negative to previous algorithm.
Introduction • Software project control their changes using SCM and capture bug reports using bug tracking software e.g Bugzilla. • Records which changed in SCM system fixes a Specific bug in the change tracking system. • Bug Progression: • Programmer makes the change • Bug-introducing change • Bug manifest itself in some undesirable external behavior. • Recorded in bug tracking system • Developer modifies the code to fix bug • Bug-fix change
Introduction (Cont’d) • Wide spread use of SCM, data concerning bug fix changes in readily availble. • It is easy to mine SCM repository to mine changes that have repaired a bug • Linking key words with bug report refrence • E.g: “Bug” or “Fixed” #902340
Major Problems with bug-fix data • It shed no light on when a bug was injected • Not always person who fixes a bug is one who caused • Can not determine where a bug occurred.
Background • SZZ Algoritham • Working: • Firstly locates key words to mark bug-fixed changes • Secondly, running a diff tool what changed in bug-fix • Diff tool returns “Hunk” • Utilizes annotate feature of SCM to find the changes • Most recent revision • Who made the chage
Background (Cont’d) • Revision1: Origin of bug (Line 3). • Revision 2: Function name changed (bar foo). • Revision 3: Bug removal
SZZ Limitations • Blank spaces and Comments • Formatting changes (Line 3) • Name of function containing bug.
Proposed Approach • Applied of method level for two java open source projects • Columba and Eclipise • Two human judges manually verified all hunk in series of bug-fix to ensure the corresponding hunks are real bug fixes.
Proposed Approach • Steps(1-5) remove 38%~51% of false positive and 14%~15% of false negatives as compared to SZZ.
Experimental setup • History Extraction • Used Kenyon to extract histories from SCM systems
Experimental setup (Cont’d) • Accuracy Measures • Bug-introducing change set consists of all the changes with in specific project revisions that have been identified as bug-introducing • Assuming R is the more accurate bug-introducing change set, then compute false positives and false negatives for the set P can be computed as follow:
Annotation Graph • Annotation Graph • A graph which contains information on the cross-revision mappings of individual lines. • Major improvement over the SZZ
Experimental setup (Cont’d) • Non behavior changes • Code format, comments & blank lines. • 14%~20% false positive • Format changes
Manual Verification • If a change log indicates the revisions is a bug-fix, it is assumed all the hunks in revision are bug fixes. • Two humans judges marked each bug-fix hunk for both projects. • Used bug-fix hunk verification tool
Validation Hurdles • Non representative systems. • Open Source. • Bug fix data is incomplete. • Manual Varicication
Bug-Introduction Statistics Eclipse Columba
Conclusion • Refined SSZ approach by introducing Annotation Graph. • Experiments showed the achievement of 38~51% of false positive and 14% of false negative removal as compared to SSZ