1 / 32

Software Networks

Software Networks. Christian Bird Computer Science Dept. UC Davis. A network like any other. A software network is made up of Nodes: software artifacts Edges: relationships between those artifacts (may be directed or undirected). imports. module. function. requires. co-comitted. file.

holli
Download Presentation

Software Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Networks Christian Bird Computer Science Dept. UC Davis

  2. A network like any other • A software network is made up of • Nodes: software artifacts • Edges: relationships between those artifacts (may be directed or undirected) imports module function requires co-comitted file class includes

  3. Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories • Libraries

  4. Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions (3000 in apache) • Classes • Files • Modules/Packages • Directories • Libraries int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c; }

  5. Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories • Libraries Class Logger { int logItem(Object item, int level) { stuff… } int logError(String msg) { more stuff… } more functions… }

  6. Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files (300 in apache) • Modules/Packages • Directories • Libraries math.c float absoluteValue(float a) { return a > 0 ? a : -a; } void printName(char *name) { printf(“Hello %s\n”, name); } more functions…

  7. Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories • Libraries class Logger { stuff… } class LogMessage { stuff… } class LogError { stuff… } more classes…

  8. Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories (65 in apache) • Libraries /apache/http-2.0/server/core/handle.c /apache/http-2.0/server/core/serve.c /apache/http-2.0/server/core/cgi.c /apache/http-2.0/server/core/locking.c

  9. Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories • Libraries (25 in apache) libkdeinit_konqueror.so libkonq.so.4 libkutils.so.1 libkio.so.4 libkdeui.so.4 libkdesu.so.4 libkdecore.so.4 libDCOP.so.4 libdl.so.2 libresolv.so.2 libutil.so.1 libart_lgpl_2.so.2 libidn.so.11 libqt-mt.so.3 libpng12.so.0 libXext.so.6 libX11.so.6 libSM.so.6 libICE.so.6 libXrender.so.1

  10. Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries

  11. Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c; }

  12. Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries Class Logger inherits Writer{ int logItem(LogMessage item, int level) { stuff… } int logError(String msg) { more stuff… } more functions… FileWriter w }

  13. Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries math.c float absoluteValue(float a) { return max(a, -a); } void printName(char *name) { printf(“Hello %s\n”, name); } more functions…

  14. Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries import java.lang.util; import edu.ucdavis.senses; class WirelessSensor { … }

  15. Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries A function in/apache/http-2.0/server/core/handle.c may call a function in/apache/http-2.0/apr-util/hash.c

  16. Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries Library libkdecore.so may need to Load libqt3-mt.so which in turn may Need to loadlibX11.so and libm.so which All need libc.so libkdecore.so libqt3-mt.so libX11.so libm.so libc.so

  17. Example Callgraph void printInt(int a) { printf(“the number is %i\n”, a); } int add(int a, int b) { return a + b; } int multiply(int a, int b) { return a * b; } int factorial(int a) { if (a == 1) return a; return multiply(a,factorial(a-1)); } void main() { printf(“calculating 6!\n”); printInt(factorial(6)); } main printInt factorial printf multiply add Never called

  18. Static versus Runtime Callgraphs • Static callgraphs are constructed by a syntactic analysis of the source code • Pros • Don’t have to build or run the program • Works in the presence of syntactic or semantic errors • Catches calls for exceptional situations • Fairly fast • Cons • Doesn’t get valued information (how many calls to each function) • Includes calls in dead code. Example: if (0 == 3) logError(…) • Doesn’t include calls through function pointers • Doesn’t include calls to functions in dynamically loaded libraries

  19. Static versus Runtime Callgraphs • Runtime callgraphs are constructed by running a piece of software one or more times and logging the number of function calls • Pros • Includes number of times function calls occur • Includes calls through function pointers and dynamically loaded libraries • Will not include calls in dead code • Cons • Requires building the software • Hard to get complete code coverage • Can take a long time • May require a test harness of some kind (especially for interactive applications) along with test data

  20. Differences between callgraphs and other graphs we’ve seen • Has a root and commonly will form a tree-like structure • Few if any cycles in callgraphs (direct or indirect recursion is rare) • Reciprocity is not common due to levels of abstraction • Preferential attachment? • If a function is called by many functions is it more likely to be called by other functions in the future? Maybe.

  21. Software Repositories • Used in development of virtually any software project (commercial, personal, OSS, etc.) • Examples include RCS, CVS, subversion, perforce, bitkeeper, and sourcesafe • Keeps track of every change to the software, who made the change, time of change, comments associated with a change, etc. • Allows us to view the evolution of a piece of software • A developer makes changes to software code and then commits the changes to the software respository with a description of the changes

  22. Software Networks from Repositories • The software history allows us to relate different artifacts in the software • Create an edge between functions, files, classes, if they all were modified in the same commit • Create an edge between artifacts if they were modified by the same developer

  23. Modularity: one use of a callgraph • The characteristic of a system that has been divided into smaller subsystems which interact with each other • Software that is modular has distinct subsystems (modules) with high levels of interaction within the subsystems and low levels of interaction between the subsystems • Software that is modular is easier to understand and maintain Modular OS Scheduler Networking Filesystem Kernel Memory Management I/O devices

  24. Modularity Case Study using Callgraphs • Exploring the structure of Complex Software Designs: An Empirical Study of Open Source by Alan MacCormack, John Rusnak, and Carliss Baldwin • Created a “Design Structure Matrix” at the file level using function calls as ties. (i.e. if a function in foo.c calls a function in bar.c then there is a tie from foo.c to bar.c, non-symmetric) • Used static analysis to extract the file-level callgraph • Clustered the DSM using standard clustering techniques • Metrics used: • Clustering cost: measure of how many function calls are not within a cluster • Propagation cost: measure of how many functions will be affected if a particular function is modified

  25. DSM examples Example System in Graphical and Dependency Matrix Form A DSM with dependencies in an “Idealized Modular Form” A change to F propagates to E, C, and A while a change to B only propagates to A All calls are within clusters so the clustering cost is 0

  26. Mozilla Project • Netscape opensourced Navigator in March 1998 • The project was named Mozilla and eventually led to what Firefox is today • Initially the code was complex and tightly coupled, a common phenomenon in industry code • This formed a high barrier to entry for volunteers to contribute code • Architecture was re-designed in late 1998 due to increasing complexity

  27. DSM’s for Mozilla

  28. Results of Mozilla Re-design

  29. More Results • After the re-design, volunteerism went up dramatically (critical for an OSS project to succeed) • Both functionality and performance increased • Both code size and number of files decreased (initially)

  30. What are we doing with software nets? • Due to CVS history, we can create a callgraph for a piece of software at any time during it’s evolution • Do certain parts of the callgraph stabilize before others? Why? • Are certain portions of the callgraph more bug-prone than others? • What does code ownership in the callgraph look like? • What is the relationship between callgraph network, co-commit network, and ownership network?

  31. More Questions • Does the software network bear any resemblance to the social network of the developers who work on it? (Conway’s Law) • Are callgraphs small-world networks? What is the distribution of in- and out-degrees? What would the answers mean (if anything)? • What partitioning techniques allow us to extract module structure from source code? • Is there a relationship between the co-committer social network and the email social network for developers?

  32. On with the show…

More Related