1 / 23

Mining Software Repositories What to do? And where to get data?

Mining Software Repositories What to do? And where to get data?. Israel Herraiz < herraiz@uax.es > Universidad Alfonso X el Sabio June 18 th 2010. Outline. What is Mining Software Repositories? What are repositories? Conferences and journals of interest

adie
Download Presentation

Mining Software Repositories What to do? And where to get data?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Software RepositoriesWhat to do? And where to get data? Israel Herraiz <herraiz@uax.es> Universidad Alfonso X el Sabio June 18th 2010

  2. Outline What is Mining Software Repositories? What are repositories? Conferences and journals of interest And some words about trending topics Tools for Mining Software Repositories Datasets for Mining Software Repositories For replicable and verifiable empirical studies

  3. 1. What is Mining Software Repositories?

  4. What is Mining Software Repositories? MSR analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects. Popular topic since 2004 MSR workshop, colocated with ICSE Working Conference since 2008

  5. What are repositories? Anything that leaves a trail about any software development or maintenance activities Also includes any software artifact Tipically Version control systems Bug tracking systems Public communication tools (mailing lists)

  6. Differences between artifact and repository hello.c hello.c.diff #include <stdio.h> int main() { printf(“Hello world”); return 0; } - printf(“Hello world”); + printf(“Hello world\n”); Author: rms Date: 20100618 04:34 UTC Change: +1 -1 Log: Forgot to add new line Repository Change to an artifact Meta-information Artifact Source code file

  7. 2. Conferences and journals of interest

  8. Working conferences of interest IEEE Int. Working Conf. Source Code Analysis & Manipulation (SCAM) IEEE Int. Working Conf. Mining Software Repositories (MSR) Deadlines Accept rate Journal possib. January (Februray for the challenge) 19% (2008) 31% (2010) EMSE IEEE TSE http://msr.uwaterloo.ca 26% (2007) 38% (2008) 45% (2009) JSS SCP April http://www.ieee-scam.org

  9. Conferences of interest IEEE Int. Conf. Software Engineering (ICSE) Empirical Software Eng. & Measurement (EMSE) IEEE Int. Conf. Software Maintenance (ICSM) Deadlines Accept rate Journal possib. 21% (2007) 26% (2008) 22% (2009) No special issues April http://icsm2010.upt.ro/ 15% (2008) 12% (2009) 14% (2010) No special issues August September http://www.sbs.co.za/ICSE2010/ March ? EMSE http://www.esem-conferences.org/

  10. Other interesting conferences Working Conference on Reverse Engineering (WCRE) http://web.soccerlab.polymtl.ca/wcre2010/ International Conference on Predictive Models and Software Engineering (PROMISE) http://promisedata.org/ European Conference on Software Mainteance and Re-engineering (CSMR) http://www.sait.escet.urjc.es/csmr2010/

  11. Journals of interest IEEE Transactions on Software Engineering (TSE) http://www.computer.org/tse/ ACM Transactions on Software Engineering and Methodology (TOSEM) http://tosem.acm.org/ Empirical Software Engineering (EMSE) http://www.springerlink.com/content/1382-3256 Journal of Systems and Software (JSS) http://www.elsevier.com/locate/jss Journal of Software Maintenance and Evolution (JSME) http://eu.wiley.com/WileyCDA/WileyTitle/productCd-SMR.html

  12. Handy links Software Engineering Conferences Verification, Formal Methods, Programming Lang. and Compilers, Web, Security http://people.engr.ncsu.edu/txie/seconferences.htm Upcoming Software Engineering Conferences Map http://research.csc.ncsu.edu/ase/semap/

  13. Trending topics Replication of empirical studies The replication package Recommendation systems Automated Software Engineering

  14. 3. Tools for Mining Software Repositories

  15. Tools for Mining Software Repositories Mining tools Libresoft Tools http://tools.libresoft.es/ CVSAnaly – CVS/SVN/Git repositories log parser MLStats – Mailman and Mboxes parser Bicho – Bugzilla and SF.net tracker parser Software Architecture Group (SWAG) – University of Waterloo http://www.swag.uwaterloo.ca/tools.html

  16. 4. Datasets for Mining Software Repositories

  17. MSR Mining Challenge Mirrors of the version archives and bug databases for Mozilla Firefox and Eclipse http://msr.uwaterloo.ca/msr2008/challenge/ Repository logs of over 500+ Gnome projects, XML dump of the bug databases, and the complete SVN repositories of 69 Gnome projects http://msr.uwaterloo.ca/msr2009/challenge/

  18. Ultimate Debian Database Database with information about packages and bug reports of Debian and Ubuntu http://udd.debian.org/

  19. Eclipse bug database Saarland University Datasheets, databases, scripts, with information about Eclipse bug reports for several releases http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/

  20. FLOSSMetrics Databases about ~5000 open source projects Control version repositories, mailing list archives, bug tracking databases MySQL dumps Not very user friendly Obtained using the Libresoft Tools http://www.flossmetrics.org/

  21. FLOSSMole Database with information about all the SourceForge.net projects ~150,000 projects Mainly metainformation, obtained through parsing the web pages of the projects No low level or fine grained information http://flossmole.org

  22. PROMISE repository All PROMISE papers must also submit a package with the data used in the paper http://promisedata.org/ 101 datasets Defect prediction (58) Effort prediction (18) General (9) Model-based SE (7) Text mining (9)

  23. Defect repository Firefox Defect Repository http://bugzilla.mozilla.org/ Eclipse Defect Repository https://bugs.eclipse.org/bugs/

More Related