1 / 20

Software Collaboration Networks

Software Collaboration Networks. By Chris Zachor. Overview. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions. Introduction. Use network analysis to better understand the SourceForge and Github community developers

tuari
Download Presentation

Software Collaboration Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Collaboration Networks By Chris Zachor

  2. Overview • Introduction • Background • Changes • Methodology • Data Collection • Network Topologies • Measures • Tools • Conclusion • Questions

  3. Introduction • Use network analysis to better understand the SourceForge and Github community developers • Identify key differences (if any) within the two communities • Examine the diversity of collaborations within these two communities

  4. Changes • The addition of Github to the study • Contains some of the same attributes to allow for a comparison • Other communities were looked at, but they either were not large enough or did not provide enough public data.

  5. Data Collection • Crawling the websites using a simple Perl script and regular expressions • Collect a project list from Sourceforge • www.sourceforge.net/projects/projectTitle • No specified request limit • Check for duplicates

  6. Sourceforge Project Page

  7. Github Crawling • Using the Github API provides our data • Limited to 60 API calls per minute • Use multiple computers to collect all 1.5 million projects

  8. Github Project Page

  9. Github API

  10. Developer/Project Network

  11. Project-Developer Network

  12. Measures and Metrics • Degree • Clustering Coeficient • Modularity • Power Law • Small World Phenomenon

  13. Degree • Average number of projects worked on by a developer • Average number of collaborations • Average number of developers on a project

  14. Clustering Coeficient • Examine how likely developers are to stick together in groups • Examine both average clustering coefficient for the entire network and the local clustering coefficient for nodes of interest

  15. Modularity • Provide us with a measure of how diverse developer collaborations are. • Range -1 < Q < 1 • Ranges closer to one show less diversity in collaboration choices • Ranges closer to negative one show more diversity in collaboration choices

  16. Power Law • Previous studies have found that the Sourceforge community does follow the power law • No such study has been done on the Github community • Fewer developers should be apart of many project while many developers should be involved with only one project

  17. Small World Phenomenon • Previous studies have shown the Sourceforge community does exhibit small world properties • Once again, no study has been done on the Github community • Using Pajek, I will create a random network of the same nodes and edges • Then, compare the clustering coefficient and the average shortest path

  18. Tools • Perl • Pajek • cURL • wget • GUESS

  19. Conclusion • Through the use of network analysis, we hope to gain a better understanding of the developers of Sourceforge and Github communities.

  20. Questions? Suggestions? Comments?

More Related