Looking for Bugs in all the Right Places

#GHC13 Looking for Bugs in all the Right Places Elaine Weyuker October 3, 2013 2013

Goal • To determine which files of a large software system are likely to contain the largest numbers of bugs in the future.

Why is this Important? • Help testers prioritize testing efforts. • Help developers decide when to do design and code reviews and what to re-implement. • Help managers allocate resources.

Approach Verified that bugs were non-uniformly distributed among files. Identified properties that were likely to affect fault-proneness, and then built a statistical model and ultimately a tool to make predictions.

Information Needed for Predictions • Size of file (KLOCs) • Number of changes to the file in the previous 2 releases. • Number of bugs in the file in the last release. • Age of file (Number of releases the file has been in the system) • Language the file is written in.

Data Source • All of the systems we’ve studied to date use a configuration management system which integrates version control and change management functionality, including bug history. • Data is automatically extracted from the associated data repository and passed to the prediction engine.

Making Predictions • Used Negative Binomial Regression • Also considered machine learning algorithms including: • Recursive Partitioning • Random Forests • BART (Bayesian Additive Regression Trees)

Prediction Tool • Consists of two parts. • The back end extracts data needed to make the predictions. • The front end makes the predictions and displays them.

Tool Functionality • Extracts necessary data from the repository. • Predicts how many bugs will be in each file in the next release of the system. • Sorts the files in decreasing order of the number of predicted bugs. • Displays results to the user.

Assessing Success • Percentage of actual bugs that occurred in the N% of the files predicted to have the largest number of bugs. (N=20) • Considered other measures less sensitive to the specific value of N.

Prediction Results for 9 Systems

What Are We Missing?

Release to be predicted User-supplied parameters Fault-proneness predictions Statistical Analysis Fault Prediction Tool Overview Prediction Engine Version Mgmt /Fault Database (previous releases)

User specifies that all problems reported in System Test phase are faults. User enters system name. Available releases are found in the version mgmt database. User chooses the releases to analyze. User selects 4 file types. User asks for fault predictions for release “Bluestone2008.1”

User confirms configuration User enters filename to save the configuration. User clicks Save & Run button, to start the prediction process.

Initial prediction view for Bluestone2008.1 All files are listed in decreasing order of predicted faults

Listing is restricted to eC files

Listing is restricted to 10% of eC files

Current Status • Prediction tool is fully-operational • 750 lines Python • 2150 lines C, 75K bytes compiled • Current version’s backend is specific for the internal AT&T configuration management system but can be adapted to other configuration management systems. All that is needed is a source of the data required by the prediction model.

Other Factors We’ve Studied • Developers • Counts – How many people worked on the code in the most recent release or all previous releases. • Individuals – Who worked on the code? • Calling Structure • How many calls from/to a file. • Are the calling/called files (new, changed, faulty) • Amount of Code Changed • How many lines added, deleted, changed

What’s Ahead? • Research • How well can we make predictions using attributes available for software systems using other bug reporting systems? • What are the most accurate models that can be built from those attributes? • What are the best ways to take advantage of fault predictions? • Can predictions be made for units smaller than files? • Can run-time attributes be used to make fault predictions? (execution time, execution frequency, memory use, …) • What is the most meaningful way to assess the effectiveness and accuracy of the predictions? • Engineering • Build prediction models for different configuration management systems and bug databases. • Design and build a better user interface. • Integrate prediction tool into development and testing environments.

Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org

Looking for Bugs in all the Right Places

Looking for Bugs in all the Right Places

Presentation Transcript

Looking For Work in All the Right Places: NDNH and TANF

Internetworking: Routing Packets to all the Right Places

Looking in All the Wrong Places: PubMed for Public Librarians

LOOKING FOR HOPE IN ALL THE WRONG PLACES

All About Bugs

All About Bugs!

All About Bugs

Looking for Love in All the Wrong Places: determining the ROI of an LMS

Looking for funding in all the right places

Buying in the right places

Are You Looking for the Right Interactions?

the gangster we are all looking for

Good to Great Getting the Right People in the Right Places

Searching in All the Right Places

Right Places For Advertising In Dubai

Looking for the Right Hospital for Cosmetic Surgery in India

Looking For The Right Attorney In Carson City

Are you looking for Interesting Places in Delhi?

Looking for PCK in All the Wrong Places?

LOOKING FOR HOPE IN ALL THE WRONG PLACES

Looking After Your Refrigerator the Right Way - All So Cool

All places in the World