open source java bug study understanding where help is needed n.
Skip this Video
Download Presentation
Open Source Java Bug Study: Understanding where help is needed

Loading in 2 Seconds...

play fullscreen
1 / 19

Open Source Java Bug Study: Understanding where help is needed - PowerPoint PPT Presentation

  • Uploaded on

Open Source Java Bug Study: Understanding where help is needed. Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University. Technology : Chains of evidence (CoE) Extra-linguistic program assurance (Lock, Uniqueness) Bureaucratic (mechanical).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Open Source Java Bug Study: Understanding where help is needed' - fern

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
open source java bug study understanding where help is needed

Open Source Java Bug Study:Understanding where help is needed

Tim Halloran

SSSG 6 Nov 2003

Carnegie Mellon University

motivation why study open source java bugs
Technology: Chains of evidence (CoE)

Extra-linguistic program assurance (Lock, Uniqueness)

Bureaucratic (mechanical)

  • Question: Can this have a positive impact on practice?

What canbe assured

Where ishelp needed


Goal: determine, empirically, how useful CoE is (how common and “costly” are defects CoE could help prevent)

Study defect reports on and code changes made (fixes) to widely deployed open source Java projects

Motivation—Why study open source Java bugs?

Fluid Assurance Tool

this talk
This talk
  • Methodology
  • Data collection
    • Selected Java projects
    • Tool data and limitations (and solutions)
  • Variable creation
  • Variable reduction
  • Summary
  • Questions & discussion

Focus Today…


(1) Bug Selection

(2) Expert Analysis


Data collection

Results Analysis

Variable creation

Expert judgment:Is bug/fix bureaucratic?

Could CoE have helped?Semantic category?Do we understand bug/fix?

Variable reduction/exploratory data analysis



Develop definitions (bureaucratic)

data collection 3 java projects investigated




Data collection:3 Java projects investigated
  • Ant (64 kSLOC)
    • A Java-based build tool
  • Struts (40 kSLOC)
    • Framework for building Java web applications based on a variation of the classic MVC design paradigm
  • Tomcat (65 kSLOC Java)
    • The official reference implementation of Java Servlet and JavaServer Pages technologies (web server)

Selection: Widely used Java software (external validity?)

data collection tool data used
Data collection:Tool data used
  • Software Defect (“Bug”) Data
    • Off-line copy of Apache Software Foundation (ASF) Bugzilla MySQL database
      • Ant: 2,230 bugs (7-Sep-00 to 16-May-03)
      • Struts: 1,473 bugs (19-Oct-00 to 16-May-03)
      • Tomcat: 4,052 bugs (26-Aug-00 to 16-May-03)
  • Code Changes
    • CVS commit logs
      • Ant: 9,565 commits (13-Jan-00 to 4-Jun-03)
      • Struts: 3,610 commits (31-May-00 to 4-Jun-03)
      • Tomcat: 14,833 commits (10-Oct-99 to 4-Jun-03)
data collection limitations of asf tool data
Data collection:Limitations of ASF tool data

Goal: Link code changes made by each bug toadd code change information to bug information




  • Problems
  • No link from bug to commits
  • Informal links from commits to bugs
  • Informal identity management
data examples
Data examples

------------------ CVS commit log 1272 at 2001-02-01 15:37:28 by Nico Seessle ------------------

Fixed Bug #378.

ExecuteOn (and Apply) have a default-value of false for their parallel-attribute.

Problem: Informal links from commits to bugs

Commit Email

Real name

Bugzilla Id

Craig R. McClanahan

Rob Leland

Problem: Informal identity management

solution 1 st manual identity determination
Solution: 1st manual identity determination
  • Manual building of project committer identity
    • 99 individuals identified
    • Used:
      • ASF web pages
      • Google, etc.
      • Dates of actions
      • Project mailing lists (headers noting real name)

Very Manual—High Confidence in Links: an “Anchor” for linking bugs to commit logs

solution 2 nd semi automated linking of bugs to commits
Solution: 2nd semi-automated linking of bugs to commits
  • Wrote Java code to assist linking CVS commits to individual Bugzilla bugs
    • Extracts all numbers from CVS commit log
    • Checks if number is a bug for the project
      • Becomes set of possible bugs
    • Checks if commit is within the duration of bug
    • Checks if committer was “involved” with the bug
      • Becomes inferred set of bugs

If extracted set matches inferred set then entry is made automatically—otherwise researcher shown all information and asked to correct the inferred set (if necessary)

example automatic link
Example: Automatic Link

"struts" bug 15799 found : created 2003-01-04 15:12:17

(15799) Bugzilla description: Nested tags picks up wrong bean for values

(15799) 2003-01-05 22:13:43 David Morris 4 1.0 Beta 3 1.1 Beta 3

(15799) 2003-02-04 21:03:34 James Mitchell 4 1.1 Beta 3 Nightly Build

(15799) 2003-02-05 02:40:54 James Turner 15

(15799) 2003-02-05 03:36:34 Ted Husted 4 Nightly Build 1.1 Beta 3

(15799) 2003-02-06 00:36:48 Arron Bates 8 NEW RESOLVED

(15799) 2003-02-06 00:36:48 Arron Bates 11 FIXED

------------------ CVS commit log 27541 at 2003-02-05 16:26:11 by Arron Bates ------------------

Committed patch Bug15799, reported and patched by David Morris.

IDEA also told me to remove a redundant class cast

( ...a fashionable thing to do it seems :)

Inferred set [15799] = [15799]

No decision required by researcher

example manual link
Example: Manual Link

"tomcat" bug 207 found : created 2000-10-28 11:58:02

(207) Bugzilla description: mod_jk.conf-auto is not generated when tomcat is started

BugRat Report#319

Not adding bug 207 to inferred set [:log time after bug lifetime:comitter not in bug group]

"tomcat" bug 660 found : created 2001-02-21 03:04:15

(660) Bugzilla description: Bad context on Authentication Form Page

Not adding bug 660 to inferred set [:log time after bug lifetime:comitter not in bug group]

"tomcat" bug 371 found : created 2000-12-22 20:24:31

(371) Bugzilla description: Webdav status code 207 not present in core/

BugRat Report#660

------------------ CVS commit log 13662 at 2001-03-15 12:15:21 by Marc Saegesser ------------------

Added 207 result code for WEBDAV.

PR: 660/Bugzilla 371

Submitted by: (David F. Sklar)

Inferred set [371]

Link bug ids (c to clear)[207, 660, 371] 371


Decision required by researcher: 207 is a result code (not a bug reference) and 660 is the id from the pre-Bugzilla Jakarta bug system

noting and linking outside contribution not done yet
Noting and linking outside contribution: not done (yet)
  • Linking contribution by non-committers to bug fixes (or enhancements) between CVS and Bugzilla
    • Often committers commit code changes contributed by non-committers
    • No standard approach in CVS logs to indicate such a contribution (informal references to known contributors)
      • Obscuring of email address (to fight SPAM) has hit open source logs
    • Linking contributor names to Bugzilla Ids would face same issues noted for committers
      • Larger scale and less “context” to manually build up a case to link identity to identifiers

Testcase submitted by: Martijn Kruithof <martijn at>

variable creation narrowing bug focus
Variable creation:Narrowing bug focus
  • total to fixed?
  • fixed to w/java?
    • Examined 20 bugs:
variable reduction preliminary principal components analysis
Factor 1: Public interest

Public_LN (0.7)

COMMsize_LN (0.6)

DUPcount_BI (0.6)

STATUSchanges_BI (0.3)

Factor 2: Java code changed

JavaCUchange (0.9)

JavaPKchange (0.8)

JavaSLOCchange (0.7)

Factor 3: Committer interest

Pcommit_BI (0.9)

Pasf_BI (-0.9)

Factor 4: Effort/Time

Dtotal_nonLATER_LN (0.7)


STATUSchanges_BI (0.6)


Variable reduction: (preliminary) Principal components analysis
  • We have a reasonable set of “synthetic” measures of some of the important characteristics of bugs and their fixes
    • How “costly” in several dimensions (time, public interest, etc.)
  • Next step: Identify, via expert judges, bugs for which CoE would have been effective
    • Combination with results so far will provide some understanding of how
questions discussion
Questions & Discussion
  • Questions?
  • Issues:
    • Approach to study
    • Definitions
      • bureaucratic (mechanical) vs. functional program properties
    • NetBeans data