1 / 27

Characterizing the Open Source Software Process: a Horizontal Study

Characterizing the Open Source Software Process: a Horizontal Study. A. Capiluppi, P. Lago, M. Morisio. Outline. Rationale behind the current study Methodology Conclusions Actual and future work. Rationale.

lecea
Download Presentation

Characterizing the Open Source Software Process: a Horizontal Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characterizing the Open Source Software Process: a Horizontal Study A. Capiluppi, P. Lago, M. Morisio

  2. Outline • Rationale behind the current study • Methodology • Conclusions • Actual and future work

  3. Rationale • Most Open Source analyses focus on a single, flagship project (Linux, Apache, GNOME) • Limitation: the conclusions are based on a ‘vertical’ study • there is a lack of ‘horizontal’ studies • a pool of projects • a wider area of interest

  4. Methodology • Choice of projects • Attributes definition • Coding • Analysis

  5. Choice of projects: repository • Selected FreshMeat repository • FreshMeat (http://freshmeat.net) is focused on Open Source development since 1996 • It gathers thousands of projects, either doubled on the pages of SourceForge (http://sourceforge.net), or hosted on FreshMeat only. • FreshMeat lists more than 24000 projects (many inactive)

  6. Choice of projects: sampling I • From 24000 to 406 - how? • FreshMeat organizes projects by filters and categories • Filter = “Topic” • Categories = {“Internet”, “Database”, “Multimedia”,…} • Other filters: Programming language, Topic (i.e. application domain), Status of Evolution, etc.

  7. Choice of projects: sampling II • We picked randomly a number of projects through the “Status” filter • Rationale: limited number of categories associated {“Planning”, “PreAlpha”, “Alpha”, “Beta”, “Stable”, “Mature”} • The overall count is 406 projects

  8. Attribute definition • Age • Application domain • Programming language • Size [KB] • Number of developers • Stable and transient developers • Number of users Modularity level Documentation level Popularity Status Success of project Vitality • Red: defined by FreshMeat • Black: defined by us

  9. Coding • Each attribute was coded twice, to capture evolutive trends • First observation: January 2002 • Second observation: July 2002

  10. Analysis • Here we discuss: • Application domain issues • Developers [stable & transient] issues • Subscribers (as users) issues • Code size issues

  11. Application domain distribution

  12. Attributes: project’s developers • We evaluate how many people write code for an application • External contributions are always credited in special-purpose files, or in the ChangeLog • We distinguish between • Stable developers • Transient developers • Core team: more than one stable developer • Manual inspections and pattern-recognition scripts

  13. Developers over projects • We observe: • 72% of projects have a single stable developer • 80% of projects have at most a number of 10 developers

  14. Developers distribution over projects

  15. Definition: clusters of developers • Cluster 1: 1 to 3 developers (64.5%) • Cluster 2: 4 to 10 developers (20%) • Cluster 3: 11 to 20 developers (9.5%) • “Average” nr. of stable dev: 2 • “Average” nr. of transient dev: 3 • Cluster 4: more than 20 developers (6%) • “Average” nr. of stable dev: 6 • “Average” nr. of stable dev: 19

  16. Productivity vs. ‘global’ developers

  17. Productivity vs. ‘stable’ developers

  18. Code variation over clusters

  19. Attributes: subscribers • We use some publicly available data to gather some proxy about users • Users ~ Mailing List subscribers (public datum) • It’s not a monotonic measure: subscribers can join and leave as well • We have a measure of users in two different observations

  20. Distribution of subscribers over project Around 42% of projects have at most 1 subscriber-user

  21. Users evolution

  22. Attributes: project’s size • We evaluate the code of each project twice • Code evaluated is contained in packages. We exclude from the count: • Auxiliary files: documentation, configuration files, GIF files, etc. • Legacy code: inherited libraries (e.g. Gnome macros), internationalization code

  23. Distribution of code size over projects

  24. Evolutive observations of size changes

  25. Conclusions I • The vast majority of projects are developed by only one developer • Adding people to a project has small effect on productivity (i.e. code per developer) • Open Source software is made by experts for experts (72% of horizontal projects have more than 10 developers) • 58% of projects didn’t change their size • 63% of projects had a change within 1%

  26. Conclusions II • Java is relevant for 8% of the projects, C/C++ for 56%, PERL with Python for 20% • Observations from flagship projects (Apache, Linux, Gnome) are not confirmed for an average Open Source project • Several projects are white noise: to be filtered out • Huge amount of data on public repositories: empirical researchers have an invaluable resource of software data

  27. Current and future work • Eliminating white noise: only projects in cluster 3 and 4 have been selected • Deeper analysis: the whole story of a project is being studied • What can we say with respect of conclusions on bigger OS projects? • What can be said about OSS evolution compared with traditional software evolution?

More Related