1 / 16

Human Migration of Open-Source Contributors

Human Migration of Open-Source Contributors. Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu. What have I done so far?. Geographical Movement of Mailing List Participants Seminar SET Capita Selecta SET

ivy
Download Presentation

Human Migration of Open-Source Contributors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Migration of Open-Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

  2. What have I done so far? • Geographical Movement of Mailing List Participants • Seminar SET • Capita Selecta SET • Who’s who in GNOME: using LSA to merge software repository identities • ICSM 2012 ERA Track / Software Engineering & Technology

  3. What are the main topics? Human migration of open-source contributors Identity matching Case study: GNOME / Software Engineering & Technology

  4. Why is human migration of open-source contributors interesting? • A passionate contributor would visit a conference. • Don't program on Fridays! • Contributors that appear as weekend commuters are less likely to introduce bugs on Fridays. • Translators that reside in a different country than the country of the target language are expected to deliver translations of lower quality. / Software Engineering & Technology

  5. What’s so interesting about this human migration of open-source contributors? • What (geographical) patterns does the migration of open-source contributors follow? • Which patterns (source  destination) are most popular? • Commute • Conferences • What are the factors that influence this migration? • Which factors are most influential? / Software Engineering & Technology

  6. How am I planning to trace these migrations? Extract emails from mailing list archive Resolve emails to location Email A is sent from locationA at timestampA Email B is sent from locationB at timestampB <locationA, timestampA> + <locationB, timestampB> = migration! But what if the contributor uses multiple email addresses? / Software Engineering & Technology

  7. What exactly is Identity Matching? Identifying which aliases belong to the same individual Common in the form <name, emailAddress> <“George Stefanakis”, george.stefanakis@domainA> <“Stephanakis, George”, g.stephanakis@domainB> Needs some similarity measure (e.g. edit distance) / Software Engineering & Technology

  8. How am I going to match these identities? / Software Engineering & Technology

  9. What will I be doing to improve the identity matching? • Increase confidence when merging email addresses • Look at fellow recipients (mailing list) • Look at coauthors (source code repository) • Use multiple similarity measures • Currently Levenshtein and Cosine Similarity • Compare performance with others (e.g. Jaccard, Jaro-Winkler, Dice’s coefficient, etc.) • Improve implementation • Currently slow • Data set limited to system’s memory • Release the tool as open-source (e.g. Github) • Compare to current implementations / Software Engineering & Technology

  10. So, what will I be doing? • Improve the identity matching algorithm’s performance • Run the algorithm on the data from the mailing list archive • Send out a questionnaire to verify the results • While waiting for the questionnaire, improve the algorithm with more advanced techniques • When we have received sufficient responses on questionnaire, analyse the data and look for patterns / Software Engineering & Technology

  11. A questionnaire? What about privacy? • Only the individual can access the data • Participation by entering their email address • Unique URL (hash) mailed to the email address • Data will not be made public • Research published based on the data will be anonymised / Software Engineering & Technology

  12. How do I confirm the identity matching? / Software Engineering & Technology

  13. How do I confirm the migrations? / Software Engineering & Technology

  14. Looks promising… / Software Engineering & Technology

  15. And what am I hoping to achieve? • A more advanced and better performing identity matching algorithm than currently exists • Versatile and open-source tool • According to which patterns and why skilled workers (open-source contributors) migrate • Work during holiday  Hobbyist • Visits conferences  High activity in project • More publications!  / Software Engineering & Technology

  16. Thank you! Questions? / Software Engineering & Technology

More Related