analysis and modeling of the open source software community l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Analysis and Modeling of the Open Source Software Community PowerPoint Presentation
Download Presentation
Analysis and Modeling of the Open Source Software Community

Loading in 2 Seconds...

play fullscreen
1 / 23

Analysis and Modeling of the Open Source Software Community - PowerPoint PPT Presentation


  • 226 Views
  • Uploaded on

Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh Computer Science Dept. NCSU NAACSOS Conference Pittsburgh, PA June 25, 2003 Supported in part by

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Analysis and Modeling of the Open Source Software Community' - Roberta


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
analysis and modeling of the open source software community

Analysis and Modeling of the Open Source Software Community

Yongqin Gao, Greg Madey

Computer Science & Engineering

University of Notre Dame

Vincent Freeh

Computer Science Dept.

NCSU

NAACSOS Conference

Pittsburgh, PA

June 25, 2003

Supported in part by

the National Science Foundation – Digital Science & Technology

outline
Outline
  • Overview
  • Data collection
  • Network modeling
  • Topological statistical analysis
  • Conclusion
overview
Overview
  • What is OSS
    • Free to use, distribution
    • Unlimited user and usage
    • Source code available and modifiable
  • Potential advantages over commercial software
    • Higher quality
    • Faster development
    • Lower cost
  • Our goal
    • Understanding the OSS phenomenon
  • Approach
    • SourceForge is the source of our empirical data
    • Modeling as social network
    • Analysis of topological statistics
data collection monthly
Data Collection — Monthly
  • Web crawler (scripts)
    • Python
    • Perl
    • AWK
    • Sed
  • Monthly
  • Since Jan 2001
  • ProjectID
  • DeveloperID
  • Almost 2 million records
  • Relational database

PROJ|DEVELOPER

8001|dev348

8001|dev8972

8001|dev9922

8002|dev27650

8005|dev31351

8006|dev12409

8007|dev19935

8007|dev4262

8007|dev36711

8008|dev8972

modeling as collaboration network
Modeling as collaboration network
  • What is collaboration network
    • A social network representing the collaborating relationships.
    • Movie actor network and scientist collaboration network
  • Difference of SourceForge collaboration network
    • Detachment
    • Virtual collaboration
    • Voluntary
    • Global
  • Bipartite property of collaboration network
sourceforge developer network

dev[72]

dev[67]

dev[52]

dev[65]

dev[70]

dev[57]

7597 dev[46]

6882 dev[47]

dev[45]

dev[64]

dev[99]

7597 dev[46]

7597 dev[46]

dev[52]

dev[72]

dev[67]

7597 dev[46]

dev[47]

6882 dev[47]

dev[55]

dev[55]

dev[55]

7597 dev[46]

7028 dev[46]

dev[70]

7597 dev[46]

7028 dev[46]

dev[57]

dev[45]

dev[51]

dev[99]

7597 dev[46]

7028 dev[46]

6882 dev[47]

6882 dev[58]

dev[61]

dev[51]

dev[79]

dev[47]

dev[58]

7597 dev[46]

dev[58]

dev[46]

9859 dev[46]

dev[54]

15850 dev[46]

dev[58]

9859 dev[46]

dev[79]

dev[58]

9859 dev[46]

dev[49]

dev[53]

9859 dev[46]

15850 dev[46]

dev[59]

dev[56]

15850 dev[46]

dev[83]

15850 dev[46]

dev[48]

dev[53]

dev[56]

dev[83]

dev[48]

SourceForge developer network

OSS Developer Network (Part)

Project 7597

Developers are nodes / Projects are links

24 Developers

dev[64]

5 Projects

2 hub Developers

Project 6882

1 Cluster

Project 7028

dev[61]

dev[54]

dev[49]

dev[59]

Project 9859

Project 15850

topological analysis
Topological analysis
  • Statistics inspected
    • Diameter
    • Average degree
    • Clustering coefficient
    • Degree distribution
    • Cluster size distribution
    • Relative size of major cluster
    • Fitness and lift cycle
  • Evolution of these statistics
diameter of developer network vs time
Diameter of developer network vs. time
  • The average of shortest paths between any pairs of vertices
  • The values for developer network (30,000 – 70,000) are between 6 and 8
diameter of project network vs time
Diameter of project network vs. time
  • The values for project network (20,000 – 50,000) are between 6 and 7
  • Diameter decreasing with time both for developer network and project network
average degree vs time
Average degree vs. time
  • The values for developer network are between 7 and 8
  • The values for project network are just between 3 and 4
degree distribution developers
Degree distribution (developers)
  • Power law in developer distribution.
  • R2 = 0.9714
degree distribution projects
Degree distribution (projects)
  • Power law in project distribution
  • R2 = 0.9838
cluster size distribution
Cluster size distribution
  • Cluster distribution of developer network
  • R2 with major cluster is 0.7426
  • R2 without major cluster is 0.9799
relative size of major cluster vs time
Relative size of major cluster vs. time
  • Stable increase of the relative size of the major cluster
  • Going to slowly converge to some fixed percentage at around 35%
  • May be an indication of the network evolution
existence of fitness
Existence of fitness
  • Investigation of development of single project can verify the existence of “young upcomer” phenomenon
  • We tracked the development of every new project in July 2001 until now (total 1660 projects)
  • Maximal monthly growth per project is 13 while average monthly growth per project is just 0.3639
summary of results
Summary of results
  • Power law rules
    • Degree distributions, cluster distribution
  • Average degree increasing with time
  • Diameter decreasing with time
  • Clustering coefficient decreasing with time
  • Fitness existed in SourceForge
  • Projects have life cycle behaviors
conclusion
Conclusion
  • Study of SourceForge collaboration network can help us understanding the OSS community
  • We investigate not only the topological statistics but also the evolution of these statistics.
  • Simulation is needed to further investigation of SourceForge collaboration network.
terminology
Terminology
  • Degree
    • The count of edges connected to given vertex
  • Degree distribution
    • The distribution of degrees throughout a network
  • Cluster
    • The connected components of the network
  • Diameter
    • Average length of shortest paths between all pairs of vertices
  • Clustering coefficient (CC)
    • CCi: Fraction representing the number of links actually present relative to the total possible number of links among the vertices in its neighborhood.
    • CC: average of all CCi in a network