Locating matching method calls by mining revision history data
Download
1 / 24

Locating Matching Method Calls by Mining Revision History Data - PowerPoint PPT Presentation


  • 193 Views
  • Uploaded on

Locating Matching Method Calls by Mining Revision History Data. Benjamin Livshits Stanford University and Thomas Zimmermann Saarland University. The Usual Suspects. Much bug-detection research in recent years Focus: generic patterns, sometimes language-specific NULL dereferences Security

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Locating Matching Method Calls by Mining Revision History Data' - edda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Locating matching method calls by mining revision history data l.jpg

Locating Matching Method Calls by Mining Revision History Data

Benjamin Livshits

Stanford University

and

Thomas Zimmermann

Saarland University


The usual suspects l.jpg
The Usual Suspects Data

  • Much bug-detection research in recent years

  • Focus: generic patterns, sometimes language-specific

    • NULL dereferences

    • Security

      • Buffer overruns

      • Format string violations

    • Memory

      • Double-deletes

      • Memory leaks

    • Locking errors/threads

      • Deadlock/races/atomicity

  • Let’s look at the space of error patterns in more detail


Classification of error patterns l.jpg
Classification of Error Patterns Data

NULL dereferences

Buffer overruns

Double-deletes

Locks/threads

NULL dereferences

Buffer overruns

Double-deletes

Locking errors/threads

Generic patterns -- the usual suspects

Bugs in

J2EE servlets

App-specific patterns particular to a system or a set of APIs

Device

drivers

Bugs in

Linux code

Error Pattern Iceberg


Classification of error patterns4 l.jpg
Classification of Error Patterns Data

NULL dereferences

Buffer overruns

Double-deletes

Locking errors/threads

There are hundreds of WinAmp plugins out there

Generic patterns -- the usual suspects

Anybody knows any good error patterns specific to WinAmp plugins?

App-specific patterns particular to a system or a set of APIs

?

  • Intuition:

    • Many other application-specific patterns exist

    • Much of application-specific stuff remains a gray area so far

  • Goal: Let’s figure out what the patterns are


Focus matching method pairs l.jpg
Focus: Matching Method Pairs Data

  • Start small:

    • Matching method pairs

    • Only two methods

    • A very simple state machine

    • Calls have match perfectly

  • Very common, our inspiration is

    • System calls

      • fopen/fclose

      • lock/unlock

    • GUI operations

      • addNotify/removeNotify

      • addListener/removeListener

      • createWidget/destroyWidget

  • Want to find more of the same


Our insight l.jpg
Our Insight Data

  • Our problem:

    • Want to find matching method pairs that are error patterns

  • Our technique:

    • Look at revision histories (think CVS)

  • Crucial observation:

  • Use data mining techniques to find method that are often added at the same time

Things that are frequently

checked in together

often form a pattern


Our insight continued l.jpg
Our Insight (continued) Data

  • Now we know the potential patterns

  • “Profile” the patterns

    • Run the application

    • See how many times each pattern

      • hits – number of times a pattern is followed

      • misses – number of times a pattern is violated

  • Based on this statistics, classify the patterns

    • Usage patterns – almost always hold

    • Error patterns – violated a large number of the times, but still hold most of the time

    • Unlikely patterns – not validated enough times


System architecture l.jpg
System Architecture Data

sort and

filter

mine CVS

histories

patterns

instrument relevant

method calls

revision history mining

run the application

dynamic analysis

post-process

usage

patterns

error

patterns

unlikely

patterns

report

patterns

report

bugs

reporting


Mining basics l.jpg

Foo.java Data

1.12

o1.addListener

o1.removeListener

Bar.java

1.47

o2.addListener

o2.removeListener

System.out.println

Baz.java

1.23

o3.addListener

o3.removeListener

list.iterator

iter.hasNext

iter.next

Qux.java

1.41

o4.addListener

System.out.println

1.42

o4.removeListener

Mining Basics

  • Sequence of revisions

  • Files Foo.java, Bar.java, Baz.java, Qux.java

  • Simplification: look at method calls only

  • Look for interesting patterns in the way methods are called


Mining matching method calls l.jpg

Foo.java Data

1.12

o1.addListener

o1.removeListener

Bar.java

1.47

o2.addListener

o2.removeListener

System.out.println

Baz.java

1.23

o3.addListener

o3.removeListener

list.iterator

iter.hasNext

iter.next

Qux.java

1.41

o4.addListener

System.out.println

1.42

o4.removeListener

Mining Matching Method Calls

  • Use our observation:

    • Methods that are frequently added simultaneously often represent a use pattern

  • For instance: …addListener(…); … removeListener(…); …


Data mining summary l.jpg
Data Mining Summary Data

  • We consider method calls added in each check-in

    • We want to find patterns of method calls

  • Too many potential pairs to consider

    • Want to filter and rank them

    • Use support and confidence for that

  • Support and confidence of each pattern

    • Standard metrics used in data mining

    • Support reflects how many times each pair appears

    • Confidence reflects how strongly a particular pair is correlated

  • Refer to the paper for details


Improvements over the traditional approach l.jpg
Improvements Over the Traditional Approach Data

  • The default data mining approach doesn’t really work

    • Filters based on confidence and support

    • Still too many potential patterns!

  • Filtering:

    • Consider only pairs with the same initial subsequence as potential patterns

  • Ranking:

    • Use one-line “fixes” to find likely error patterns


Matching initial call sequences l.jpg

Foo.java Data

1.12

o1.addListener

o1.removeListener

Bar.java

1.47

o2.addListener

o2.removeListener

System.out.println

Baz.java

1.23

o3.addListener

o3.removeListener

list.iterator

iter.hasNext

iter.next

Qux.java

1.41

o4.addListener

System.out.println

1.42

o4.removeListener

Matching Initial Call Sequences

1 Pair

3 Pairs 1 Pair

10 Pairs 2 Pairs

1 Pair 0 Pairs

0 Pairs


Using fixes to rank patterns l.jpg

Foo.java Data

1.12

o1.addListener

o1.removeListener

Bar.java

1.47

o2.addListener

o2.removeListener

System.out.println

Baz.java

1.23

o3.addListener

o3.removeListener

list.iterator

iter.hasNext

iter.next

Qux.java

1.41

o4.addListener

System.out.println

1.42

o4.removeListener

Using Fixes to Rank Patterns

  • Look for one-call additions which likely indicate fixes.

  • Rank pairs for such methods higher.

This is a fix!Move pairs containing removeListener up


Mining setup l.jpg
Mining Setup Data

  • Apply these ideas to the revision history of Eclipse

    • Very large open-source project

    • Many people working on it

  • Perform filtering based on

    • Pattern support

    • Pattern strength

  • Get 32 strongly correlated method pairs in Eclipse



Some interesting method pairs 2 l.jpg
Some Interesting Method Pairs (2) Data

Begin applying a thread scheduling rule to a Java thread


Some interesting method pairs 3 l.jpg
Some Interesting Method Pairs (3) Data

Register/unregister the current widget with the parent display object for subsequent event forwarding


Some interesting method pairs 4 l.jpg
Some Interesting Method Pairs (4) Data

Add/remove listener for a particular kind of GUI events


Some interesting method pairs l.jpg
Some Interesting Method Pairs Data

Use OS native locking mechanism for resources such as icons, etc.


Dynamically check the patterns l.jpg
Dynamically Check the Patterns Data

  • Home-grown bytecode instrumenter

    • Get a list of matching method pairs

    • Instrument calls to any of the methods to dump parameters

  • Post-processing of the output

    • Process a stream of events

    • Find and count matches and mismatches

      o.register(d)

      o.deregister(d)

      o.deregister(d)

matched

???

mismatched


Experimental setup l.jpg
Experimental Setup Data

  • Applied our approach to Eclipse

    • One of the biggest Java applications

    • 2,900,000 lines of Java code

    • Included many Eclipse plugins consisting of lower quality code than the core

    • Chose 32 matching method pairs

  • Times:

    • 5 days to fetch and process CVS histories

    • 30 minutes to compute the patterns

    • An hour to instrument Eclipse

    • And we are done!


Experimental summary l.jpg
Experimental Summary Data

  • Pattern classification:

    • 5 are usage patterns

    • 5 are error patterns

    • 5 are unlikely patterns

    • 17 were not hit at runtime

  • Error patterns

    • Resulted in a total of 107 dynamically confirmed bugs

    • Results for a single run of Eclipse


A preview of coming attractions l.jpg
A Preview of Coming Attractions… Data

  • We have a paper in FSE 2005

  • Describes a tool called DynaMine

  • Provides various extensions:

    • More complex patterns:

      • State machines

      • Grammars

    • More applications analyzed

    • More bugs found


ad