1 / 26

Process Mining Software Repositories

Process Mining Software Repositories. Master project kickoff presentation Wouter Poncin , w.poncin@student.tue.nl. Agenda. Introduction Existing approaches Project goal Prototype Design Current work. Introduction. Software development teams Software repositories Analysis.

Download Presentation

Process Mining Software Repositories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process Mining Software Repositories Master project kickoff presentation WouterPoncin, w.poncin@student.tue.nl

  2. Agenda Introduction Existing approaches Project goal Prototype Design Current work / Department of Mathematics and Computer Science

  3. Introduction Software development teams Software repositories Analysis / Department of Mathematics and Computer Science

  4. Existing approaches NavTracks [Sin05] eROSE [Zim05] DynaMine [Liv05] MarmoSet [Spa05] projectWatcher [Gut04] Traceability links [Kag07] Improve bug finding [Wil05] Predict change [Yin04] / Department of Mathematics and Computer Science

  5. Existing approaches – multiple data sources Images from: http://www.cs.ubc.ca/labs/spl/projects/hipikat/ Hipikat: recommends relevant software artifacts based on the current context of a developer [Čub05] / Department of Mathematics and Computer Science

  6. Existing approaches – multiple data sources Images from: http://www.sqo-oss.org/ Alitheia Core: a platform for software engineering research [Gou09] / Department of Mathematics and Computer Science

  7. Existing approaches – multiple data sources • Other approaches: • Wolf et al. [Wol09]:Mining task-based social networks to explore collaboration in software teams. • Bird et al. [Bir06]:Mining email social networks • Robles et al. [Rob05]:Developer identification methods for integrated data from various sources / Department of Mathematics and Computer Science

  8. Existing approaches – problems • Mostly single data source • Problems with multiple data source approaches: • Provide artifact centered view (Hipikat) • Focus on metric calculation (Alitheia Core) • No analysis on global process overview • Example analysis questions: • How does the real (mined) organizational model relate to the ‘used’ organizational model? • How to classify developers of open source projects? [Nak02] • Does the project follow a given development process model? (waterfall / XP / …) / Department of Mathematics and Computer Science

  9. Existing approaches – problems Mostly single data source No analysis on global process overview Solution: process mining / Department of Mathematics and Computer Science

  10. Intermezzo: process mining Image from: http://prom.win.tue.nl/research/wiki/_detail/research/processmining.gif / Department of Mathematics and Computer Science

  11. Intermezzo: process mining Example from: [Med09] Input: event log Output: models / Department of Mathematics and Computer Science

  12. Project goal The goal of this project is to develop an application which facilitates process analysis of data from various software repositories, in an easy manner. Facilitate  export data to log Various repositories  combine data Various repositories  later add new types of data Easy manner  add a data source by URL Open source & closed source projects / Department of Mathematics and Computer Science

  13. Prototype Console application Input: repository url’s Output: MXML process log Analysis: ProM Simple developer matching High level events Case: originator / Department of Mathematics and Computer Science

  14. Prototype • Project: Gallery(web based photo gallery software)http://sourceforge.net/projects/gallery/ • Used data sources: • SVN repository (20740 revisions) • TRAC tickets (1028) • Mailing list archives: ‘devel’ (2867 messages), ‘translate’ (108 messages),‘announce’ (69 messages) / Department of Mathematics and Computer Science

  15. Prototype – analysis / Department of Mathematics and Computer Science

  16. Prototype – analysis Legend: - yellow: TRAC ticket - white: SVN revision - red: Mail (translations) - blue: Mail (devel) - green: Mail (announce) / Department of Mathematics and Computer Science

  17. Prototype – analysis Legend: - yellow: TRAC ticket - white: SVN revision - red: Mail (translations) - blue: Mail (devel) - green: Mail (announce) / Department of Mathematics and Computer Science

  18. Prototype – analysis / Department of Mathematics and Computer Science

  19. Design • Application requirements: • Support multiple data sources (software repositories) • Caching of data from data sources • Define data filters • Developer matching • Define mapping from data elements to log elements • Easy addition of new plugins for data source types / export types / Department of Mathematics and Computer Science

  20. Design • Issues • How to define a case • Level of granularity of events • How to define developer matching (manual/automatic) / Department of Mathematics and Computer Science

  21. Design • Data sources to support: • Subversion • CVS • Git(used for jQuery / mootools for example) • Bugzilla • TRAC • Wiki articles (+history) • SourceForgemailinglists • SourceForge thumbs up/down • Twitter / Department of Mathematics and Computer Science

  22. Design • Analysis tools: • ProM: www.processmining.org (open source) • Futura Reflect: www.futuratech.nl • Interstage Business Process Manager • Fluxicon: www.fluxicon.com • And others… / Department of Mathematics and Computer Science

  23. Current work • Finish application development • Developer matching • Case definition • Internal cache • Implement data source plugins • Analyze projects • (Large) open source projects • Like Firefox, WordPress, Filezilla for example • SEP / student projects / Department of Mathematics and Computer Science

  24. Questions ? / Department of Mathematics and Computer Science

  25. References [Bir06] Bird, C., Gourley, A., Devanbu, P., Gertz, M., Swaminathan, A. Mining email social networks. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 137–143, New York, NY, USA, (2006). ACM. [Čub05] Cubranic, D., Murphy, G.C., Singer, J., Booth, K.S. Hipikat: A project memory for software development. IEEE Trans. Softw. Eng., 31(6):446–465, (2005). [Gou09] Gousios, G., Spinellis, D. Alitheia core: An extensible software quality monitoring platform. Software Engineering, International Conference on, pages 579–582, (2009). [Gut04] Gutwin, C., Penner, R., Schneider, K. Group awareness in distributed software development. In CSCW '04: Proceedings of the 2004 ACM conference on Computer supported cooperative work, pages 72–81, New York, NY, USA, (2004). [Kag07] Kagdi, H., Maletic, J.I., Sharif, B. Mining software repositories for traceability links. In ICPC '07: Proceedings of the 15th IEEE International Conference on Program Comprehension, pages 145–154, Washington, DC, USA, (2007). IEEE Computer Society. [Liv05] Livshits, B., Zimmermann, T. DynaMine: nding common error patterns by mining software revision histories. In ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 296–305, New York, NY, USA, (2005). ACM. [Med09] Medeiros, A.K.A. de, Aalst, W.M.P. van der. Process mining towards semantics. pages 35–80, (2009). [Moc00] Mockus, A., Fielding, R.T., Herbsleb, J. A case study of open source software development: the apache server. In ICSE '00: Proceedings of the 22nd international conference on Software engineering, pages 263–272, New York, NY, USA. ACM. / Department of Mathematics and Computer Science

  26. References [Nak02] Nakakoji, K., Yamamoto, Y., Nishinaka, Y., Kishida, K., Ye, Y. Evolution patterns of open-source software systems and communities. In IWPSE '02: Proceedings of the International Workshop on Principles of Software Evolution, pages 76–85, New York, NY, USA, (2002). ACM. [Rob05] Robles, G., Gonzalez-Barahona, J.M. Developer identication methods for integrated data from various sources. In MSR '05: Proceedings of the 2005 international workshop on Mining software repositories, pages 1–5, New York, NY, USA, (2005). ACM. [Sin05] Singer, J., Elves, R., Storey, M. Navtracks: Supporting navigation in software maintenance. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance, pages 325–334, Washington, DC, USA, (2005). IEEE Computer Society. [Spa05] Spacco, J., Strecker, J., Hovemeyer, D., Pugh, W. Software repository mining with marmoset: an automated programming project snapshot and testing system. SIGSOFT Softw. Eng. Notes, 30(4):1–5, (2005). [Wil05] Williams, C.C., Hollingsworth, J.K. Automatic mining of source code repositories to improve bug finding techniques. Software Engineering, IEEE Transactions on, 31(6):466–480, June 2005. [Wol09] Wolf, T., Schröter, A., Damian, D., Panjer, L.D., Nguyen, T.H.D. Mining task-based social networks to explore collaboration in software teams. IEEE Softw., 26(1):58–66, (2009). [Yin04] Ying, A.T.T., Murphy, G.C., Ng, R., Chu-Carroll, M.C. Predicting source code changes by mining change history. IEEE Transactions on Software Engineering, 30(9), (2004). [Zim05] Zimmermann, T., Dallmeier, V., Halachev, K., Zeller, A. eROSE: guiding programmers in eclipse. In OOPSLA '05: Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 186–187, New York, NY, USA, (2005). ACM. / Department of Mathematics and Computer Science

More Related