1 / 20

Clustering Software Artefacts Based on Frequent common changes

Clustering Software Artefacts Based on Frequent common changes. Presented by Haroon Malik. Abstract. The clusters of artifacts that are frequently changed changed together are subsystem candidates. Two step method identification of clusters:

Download Presentation

Clustering Software Artefacts Based on Frequent common changes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering Software Artefacts Based on Frequent common changes Presented by Haroon Malik

  2. Abstract • The clusters of artifacts that are frequently changed changed together are subsystem candidates. • Two step method identification of clusters: • Extracting Co-Change graph from the version control repository. • Computing a layout of the co-changed graph. This reveals the cluster of frequent co-change artifacts.

  3. Proposed Model • High level description can be recovered from source code and other low-level information through reverse engineering. • Software clustering divided software artifacts into subsystems which are as independent as possible with respected to comprehension, change, reuse etc. • Co-change graph model is proposed for clustering software system

  4. Proposed Model (Con’t) • Co-change Graph: • Abstraction of version control repositories. • Vertices of this graph are: • Software artifacts (Files or Functions) & • Change transactions ( Commits in terms of CVS). • Edges connect the change transaction with their participating artifacts.

  5. Proposed Model (Con’t) • Presentation: • The result of clustering is not a partition of the graph vertices, but a layout of graph vertices. • This layout of the graph refers to position of the graph vertices in two or three dimensional space. • Heavily co-changed artifacts closer together. • Rarely co-changed artifacts at larger distances. • Layout is comprehensive and provides additional information • How clearly Clusters are Separated. • If artifacts are at center of the cluster or rather between two clusters.

  6. Proposed Model (Con’t) • Contents: • Not just arranged in some nice way, but their positions have a well-defined interpretation with respect to their common changes. • Two artifacts are placed closer to the degree of that their common change is stronger then random.

  7. Co-Change Graph. • The graph refers to the common changes of artifacts in version repositories. • It can be easily extracted from version repositories. • Ensures, that the clustering results have a clear interpretation in terms of repositories. • Biases though arbitrary choices i.e. weight function of values of free parameters are minimized.

  8. Co-Change Graph (Con’t) • Software artifact: • Is an entity that belongs to a software system • E.g. A package, a file, a line of code or even a piece of document • Version: • State of a software artifact at a particular point in time. • CVS system stores version of artifacts in a central repository. • User of such systems modify local copies of the software artifacts, and check-in their changes to the central repository.

  9. Co-Change Graph (Con’t) • Change Transaction: • It is a coherent sequence of cheek-ins of several software artifacts. • Software artifacts that participate in the same change transaction are co-changed (commonly changed). • The Co-change graph of a give version of repository is an undirected graph (V,E ). • The set of vertices V of the co-change graph contains all the software artifacts and all change transaction of the version repository.

  10. Co-Change Graph (Con’t) • The set of edges E contains the undirected edge {c,a}, if the artifact a was changed by transaction c. • Bipartite: • It contains no edges that connect two change transaction of two software artifacts.

  11. Co-Change Graph (Con’t) • For a vertex v of a co-changed graph, the number of its adjacent vertices is called the degree of v and denoted by deg(v) • For transaction vertices; te degree gives the number of artifacts that participate in the transaction. • For artifacts, the degree gives the number of their changes.

  12. Weight Co-change Graph • It involves assigning a real number to each edge by weigh function (w) to set of Edges (E) • The real number assigned to each edge interprets the importance of the corresponding change. • Each edge is give same weight.

  13. Condensed Co-Change Graph • It is a weighted, undirected graph (V,E,w), for a given repository. • Where, the set of vertices V contains all software artifacts in repository. • Set of Edges E contains the edge {a,a’}, if the artifact a and a’ were commonly changed by a transaction.

  14. Edge-Repulsion Linlog Energy Model • This model specify the good graph layout. • The basic idea is that in co-change graph edges causes both repulsion and attraction. • Every edge will cause same amount of repulsion and attraction. • Model helps in creating suitable readable layouts

  15. Evaluation • The Software system were chosen based on : • Size, number of developers, project duration and artifacts in different programming languages • Based of familiarity.

  16. Evaluation • The co-change graph were extracted on file level • A tool cvs2cl2 is used to recover change transaction from CVS repository • A calculator for relation generated the co-change graph from transaction ---- CrocoPat • Duration, total changes indeed all number were obtained with tool Stat CVS • Layout was computed using utomatically usig Edge repulsion linlog energy model

  17. Artifacts in the CrocoPat repository

  18. Artifacts in the Rabbit repository

  19. Artifacts in the Blast repository

  20. Conclusions • Introduced a new method for clustering software artifacts. • Defined the co-change graph as underlying formal model • Evaluated our method on three example software systems with different types of documents and source code in several programming languages

More Related