1 / 37

DynaMine: Finding Common Error Patterns by Mining Software Revision Histories

DynaMine: Finding Common Error Patterns by Mining Software Revision Histories. Benjamin Livshits Stanford University. Thomas Zimmermann Saarland University. A Box Full of Nails. A lot of promise potential excitement Not that many success stories Not sure what to apply it to

ciaran-guy
Download Presentation

DynaMine: Finding Common Error Patterns by Mining Software Revision Histories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DynaMine: Finding Common Error Patternsby Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University

  2. A Box Full of Nails • A lot of • promise • potential • excitement • Not that many success stories • Not sure what to apply it to • Let’s try this particularly exciting idea • Miners looking at their tools • Promises, promises… • Interesting usage patterns found by CVS mining • Interesting error patterns found by CVS mining

  3. My Background • Tools for bug detection • Analysis: pointer analysis, etc. • Mostly static, some dynamic • Applications: • Security • Buffer overruns • Format string violations • SQL injections • Cross-site scripting • HTTP response splitting • Data lifetimes • J2EE patterns • Bad session stores • Lapsed listeners • Eclipse patterns • Missing calls to dispose • Not calling super • Forgetting to deregister listeners

  4. Classification of Error Patterns NULL dereferences Buffer overruns Double-deletes Locks/threads NULL dereferences Buffer overruns Double-deletes Locking errors/threads Generic patterns -- the usual suspects Bugs in J2EE servlets App-specific patterns particular to a system or a set of APIs Device drivers Bugs in Linux code Error Pattern Iceberg

  5. Classification of Error Patterns NULL dereferences Buffer overruns Double-deletes Locking errors/threads There are hundreds of WinAmp plugins out there Generic patterns -- the usual suspects Anybody knows any good error patterns specific to WinAmp plugins? App-specific patterns particular to a system or a set of APIs ? • Intuition: • Many other application-specific patterns exist • Much of application-specific stuff remains a gray area so far • Goal: Let’s figure out what the patterns are

  6. Motivation: Matching Method Pairs • Start small: • Matching method pairs • Only two methods • A very simple state machine • Calls must match perfectly, order matters • Very common, our inspiration is • System calls • fopen/fclose • lock/unlock • … • GUI operations • addNotify/removeNotify • addListener/removeListener • createWidget/destroyWidget • … • Want to find more of the same • And, if are lucky, more interesting patterns

  7. DynaMine: Our Insight • Our problem: • Want to find patterns whose violation causes errors • Want to find patterns for program understanding • Our technique: • Look at revision histories • Crucial observation: • Use data mining techniques to find method that are often added at the same time Things that are frequently checked in together often form a pattern

  8. DynaMine: Our Insight (continued) • Now we know the potential patterns • “Profile” the patterns • Run the application • See how many times each pattern • hits – number of times a pattern is followed • misses – number of times a pattern is violated • Based on this statistics, classify the patterns • Usage patterns – almost always hold • Error patterns – violated a large number of the times, but still hold most of the time • Unlikely patterns – not validated enough times

  9. Architecture of DynaMine sort and filter mine CVS histories patterns instrument relevant method calls revision history mining run the application dynamic analysis post-process usage patterns error patterns unlikely patterns report patterns report bugs reporting

  10. Mining approach

  11. Foo.java 1.12 o1.addListener o1.removeListener Bar.java 1.47 o2.addListener o2.removeListener System.out.println Baz.java 1.23 o3.addListener o3.removeListener list.iterator iter.hasNext iter.next Qux.java 1.41 o4.addListener System.out.println 1.42 o4.removeListener Mining Basics • Rely on co-change • Simplification: look at method calls only • Look for interesting patterns in the way methods are called • Example: • Sequence of revisions • Files Foo.java, Bar.java, Baz.java, Qux.java

  12. Foo.java 1.12 o1.addListener o1.removeListener Bar.java 1.47 o2.addListener o2.removeListener System.out.println Baz.java 1.23 o3.addListener o3.removeListener list.iterator iter.hasNext iter.next Qux.java 1.41 o4.addListener System.out.println 1.42 o4.removeListener Mining Matching Method Calls • Use our observation: • Methods that are frequently added simultaneously often represent a usage pattern • For instance: …addListener(…); … removeListener(…); …

  13. Data Mining Summary • We consider method calls added in each check-in • We want to find patterns of method calls • Too many potential patterns to consider • Want to filter and rank them • Use support and confidence for that • Support and confidence of each pattern • Standard metrics used in data mining • Support reflects how many times each pair appears • Confidence reflects how strongly a particular pair is correlated • Refer to the paper for details

  14. Improvements Over the Traditional Approach • Default data mining approach doesn’t quite work • Filters based on confidence and support • Still too many potential patterns! • Filtering: • Consider only patterns with the same initial subsequence as potential patterns • Ranking: • Use one-line “fixes” to find likely error patterns

  15. Foo.java 1.12 o1.addListener o1.removeListener Bar.java 1.47 o2.addListener o2.removeListener System.out.println Baz.java 1.23 o3.addListener o3.removeListener list.iterator iter.hasNext iter.next Qux.java 1.41 o4.addListener System.out.println 1.42 o4.removeListener Matching Initial Call Sequences 1 Pair 3 Pairs 1 Pair 10 Pairs 2 Pairs 1 Pair 0 Pairs 0 Pairs

  16. Foo.java 1.12 o1.addListener o1.removeListener Bar.java 1.47 o2.addListener o2.removeListener System.out.println Baz.java 1.23 o3.addListener o3.removeListener list.iterator iter.hasNext iter.next Qux.java 1.41 o4.addListener System.out.println 1.42 o4.removeListener Using Fixes to Rank Patterns • Look for one-call additions which likely indicate fixes • Rank patterns with such methods higher This is a fix!Move patterns containing removeListener up

  17. Applications under Study • Apply these ideas to the revision history of Eclipse and jEdit • Very large open-source projects • Many people working on both, are all over the planet • 122 on Eclipse • 92 on jEdit • Many check-ins • Eclipse 2,837,854 • jEdit 144,495 • Long histories • Eclipse since 2001 • jEdit since 2000

  18. Some patterns(as promised)

  19. Categories of Patterns • Method calls during execution: • Care about the methods • Care about the order • Care about the parameters/return values • Here’re some common cases • Matching method pairs • State machines • More complex patterns

  20. Some Interesting Method Pairs (1)

  21. Some Interesting Method Pairs (2) Register/unregister the current widget with the parent display object for subsequent event forwarding

  22. Some Interesting Method Pairs (3) Add/remove listener for a particular kind of GUI events

  23. Some Interesting Method Pairs (4) Use OS native locking mechanism for resources such as icons, etc.

  24. State Machines • Order captured by a state machine • Must be followed precisely: omitting or repeating a method call is a sign of error. • Simplest formalism for describing the object life-cycle. • Matching method pairs – specific case • Very common in C • Consider OS code • Less common in Java, but…

  25. State Machines (1) o.enterAlignment [o.redoAlignment] o.exitAlignment • Part of the org.eclipse.jdt.internal.formatter.Scribe package responsible for pretty-printing of code • enterAlignment/exitAlignment pairs must match • redoAlignment is invoked in exception cases

  26. State Machines (2) o.beginCompoundEdit() (o.insert(...) | o.remove(...))+ o.endCompoundEdit() • Compound edits within jEdit: can be undone at once • beginCompoundEdit/endCompoundEdit act as brackets • Other operations inbetween

  27. State Machines (3) OS.PmMemCreateMC [OS.PmMemStart OS.PmMemFlush OS.PmMemStop] OS.PmMemReleaseMC • Memory context manipulation (like memory pools) • Wrappers around underlying OS functionality • The middle part of the pattern is optional

  28. More Complex Stuff (1) try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally { if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork)); } } catch (CoreException e) { return e.getStatus(); } finally { monitor.done(); }

  29. More Complex Stuff (2) try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally { if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork)); } } catch (CoreException e) { return e.getStatus(); } finally { monitor.done(); }

  30. More Complex Stuff (3) try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally { if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork)); } } catch (CoreException e) { return e.getStatus(); } finally { monitor.done(); }

  31. Grammar for Workspace Transactions • Requires human intelligence • Requires a lot of it • Is actually an excellent pattern – haven’t seen runtime violations S → O O → w.prepareOperation() w.beginOperation() U w.endOperation() U → w.getWorkManager().beginUnprotected() S [w.getWorkManager().operationCanceled()] w.getWorkManager().beginUnprotected()

  32. Dynamic checking

  33. Dynamically Check the Patterns • Home-grown bytecode instrumentor • Get a list of matching patterns • Instrument calls to any of the methods to dump parameters • Post-processing of the output • Process a stream of events • Find and count matches and mismatches … o.register(d) … o.deregister(d) … o.deregister(d) matched ??? mismatched

  34. Experiments

  35. Experimental Setup • Applied to Eclipse and jEdit • 3,600,000 lines of Java code combined • Included many plugins • Times: • 6 days to fetch and process CVS histories • 30 minutes to compute the patterns • An hour to instrument • 15 minutes to run • And we are done!

  36. Experimental Summary • Pattern classification: • 56 patterns total • 13 are usage patterns • 8 are error patterns • 11 are unlikely patterns • 24 were not hit at runtime • Error patterns • Resulted in a total of 264 dynamically confirmed pattern violations

  37. Summary • Knowing code patterns is important • We explored using software histories: • Co-change often indicates patterns • Use previous fixes (one-line changes) to drive error patterns • Found interesting patterns: • Matching method pairs • State machines • More complex stuff • Confirmed valid patterns • Found pattern violations at runtime • We have a paper in FSE 2005

More Related