1 / 14

- PowerPoint PPT Presentation

  • Uploaded on

Oops…. tim@menzies.us fayolapeters@gmail.com andrian amarcus@wayne.edu MSR ’ 13. Inevitable, due to the complexity &novelty of our work. (But rarely reported, which is…. suspicious) What can we learn from those mistakes?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - bona

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript



andrian amarcus@wayne.edu



Inevitable, due to the complexity &novelty of our work

(But rarely reported, which is…. suspicious)

What can we learn from those mistakes?

An msr 13 paper cross company learning can us can learn from them
An MSR’13 paper: Cross-company learning Can “Us” can learn from “them”?

  • Provided “us”selects right data from “them”

    • Relevancy filtering: [Turhan09] (and any others)

    • Selection guided by structure of “us”

  • If “we” is small and “them” is many:

    • Selection guided using kernelfunctions learned from “them”

    • Result #1: out-performed [Turhan09].

  • Result #2: Result #1 was a coding error

Houston we have a problem
Houston, we have a problem

  • Mar 15: paper accepted to MSR

    • “Better cross-company defect prediction”

  • Mar 29: camera-ready submitted,

  • ?Apr 10: pre-prints go on-line

  • April 29: Hyeongmin Jeon, graduate student at Pusan Natl. Univ.,

    • Emailed us: can’t reproduce result

  • May 4: Peters, checking code, found error

    • Manic week of experiments ….

  • May11: results definitely wrong

    • Emails to MSR organizers

Btw, < 3 weeks. Wow…

Coding error
Coding error

  • Distance between test & training instance

    • Remove classes

    • Ran a distance function

    • Re-inserted the classes

  • But…. bad re-insert

    • Used the training class

    • Not the test class

Pull the paper
Pull the paper?

  • In the internet age, isthatevenpossible?

    • X peoplenowhave local copies of thatpaper

    • WhichGoogle mighteasilystumbleacross

Old pre-print, found

May 15

Authors report your mistakes openly and honestly
Authors: report your mistakes, openly and honestly

  • We need to expect, allow, papers with sections: “clarifications”, “errata”, “retractions”

  • E.g. Murphy-Hill, Parnin, Black. IEEE TSE, Jan 2012

Conference organizers encourage research honesty
Conference organizers: encourage research honesty

  • Need CFPs with text that encourages

    • Repeating and testing and challenging old results

Researchers share data check each other s conclusions
Researchers: Share data, check each other’s conclusions

  • Reinhart & Rogoff [2010]

    • “countries with debt over 90% of GDP suffer notably lower economic growth.”

  • Thomas Herndon, 3rd year Ph.D. U.Mass.

    • Unable to replicate with publicly available data ,

    • Asked Reinhart & Rogoff for their data

    • Got it (Their spreadsheet)

    • Found errors in data on economic growth vs debt levels.

  • A triumph for open science

    • Sadly, reported in media as grave mistake

    • E.g. http://goo.gl/HGugL

    • Immature view of the nature of science

Supervisors encourage a culture of research honesty
Supervisors : encourage a culture of research honesty

  • What will you tell others about this paper?

    • A failure? Or a success of the open science method?

    • Its up to you but understand the implications

  • If we don’t let grad students report mistakes

    • Then they won’t

  • Students graduate,

  • Leave you,

  • The error emerges

  • And you are left with with the problem

Specific lessons
Specific lessons

  • Data mining experiments are complex software prototypes

    • Version control (of code and data)

    • Code inspections

    • Trap and log your random number seeds

    • Rewrite data rarely

      • Pull out the class, process, put it back?

      • Fuhgeddaboudit

      • Have data headers of different types

        • So (say) distance measures can skip over classes

The above error does noteffect Peters & MenziesICSE’12 and TSE’13

Open access science
Open accessscience

  • Repeatable, improvable,

    • and sometimeseven refutable

  • Weshouldnotcelebratethefailedpaper

  • Butweshouldcelebrate

    • Theopen sciencecommunitythatfindssucherrors

      • MSR, PROMISE, etc

    • Thegradstudentsthatstruggleto reproduce results

      • HyeongminJeon

    • Theintegrity of gradstudentswhosefirst responseonfindingan error wastoreportit

      • FayolaPeters

Was this a useful mistake
Was this a “useful” mistake?

  • Is this insight within this mistake?

  • What does it mean if using more experience makes the defect predictor worse?

  • International workshop on Transfer Learning in Software Engineering

    • Nov, ASE’13