1 / 52

Motivation

Comparing the Decompositions Produced by Software Clustering Algorithms using Similarity Measurements. Motivation. Using module dependencies when determining the similarity between two decompositions is a good idea…. Clustering the Structure of a System (1). Given the structure of a system….

onawa
Download Presentation

Motivation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing the Decompositions Produced by Software ClusteringAlgorithms using Similarity Measurements Reverse Engineering (Evaluation of Clustering Algorithms)

  2. Motivation Using module dependencies when determining the similarity between two decompositionsis a good idea… Reverse Engineering (Evaluation of Clustering Algorithms)

  3. Clustering the Structure of a System (1) Given the structure of a system… Reverse Engineering (Evaluation of Clustering Algorithms)

  4. Clustering the Structure of a System (2) The goal is to partition the system structure graphinto clusters… The clusters shouldrepresent the subsystems Reverse Engineering (Evaluation of Clustering Algorithms)

  5. Clustering the Structure of a System (3) But how do we know that the clustering result is good? Reverse Engineering (Evaluation of Clustering Algorithms)

  6. Ways to Evaluate Software Clustering Results… • Assess it against a mental model • Assess it against a benchmark standard • Techniques: • Subjective Opinions • Similarity Measurements Given a software clustering result, we can: Reverse Engineering (Evaluation of Clustering Algorithms)

  7. Blue Edges: Similarity still the same… Green Edges: Similarity still the same… Red Edges: Not as similar… Example: How “Similar” are these Decompositions? M1 M2 M3 M7 PA M5 M6 M4 M8 Conclusions: Once we add the red edges the similarity between PA and PB decreases M1 M2 M3 M7 PB M4 M5 M6 M8 Reverse Engineering (Evaluation of Clustering Algorithms)

  8. Observations • Edges are important for determining the similarity between decompositions • Existing measurements don’t consider edges: • Precision / Recall (similarity) • MoJo (distance) • Our idea: Use the edges to determine similarity Reverse Engineering (Evaluation of Clustering Algorithms)

  9. Research Objectives • Create new similarity measurements that use dependencies (edges) • EdgeSim(similarity) • MeCl(distance) • Evaluate the new similarity measurements against MoJo & Precision/Recall • Use similarity measurements to support evaluation of software clustering results. Reverse Engineering (Evaluation of Clustering Algorithms)

  10. Add Blue Edges: PR, MoJo, MeCl & EdgeSim unchanged. Add Green Edges: PR, MoJo, MeCl & EdgeSim unchanged. Add Red Edges: PR, MoJo unchanged. EdgeSim, MeCl reduced. Example: How “Similar” are these Decompositions? M1 M2 M3 M7 PA M5 M6 M4 M8 M1 M2 M3 M7 PB M4 M5 M6 M8 Reverse Engineering (Evaluation of Clustering Algorithms)

  11. Definitions M1 M3 M2 M4 Internal/Intra-Edge: Edge within a cluster External/Inter-Edge: Edge between two clusters Reverse Engineering (Evaluation of Clustering Algorithms)

  12. EdgeSim Example a b k l MDG PA f g c h a b d e i j c f g d e h e d c i j a h PB i k l f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  13. EdgeSim Example MDG PA a b k l a b f g c c f g h d e i j d e h i j PB e d c k l a h i Step 1:Find Common Inter- and Intra-Edges f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  14. EdgeSim Example MDG PA a b k l a b f g c c f g h d e i j d e h i j PB e d c k l a CommonEdge Weight h i f 10 = = 53% k b Total EdgeWeight 19 j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  15. MeCl Example a b k l MDG PA f g c h a b d e i j c f g d e h e d c i j a h PB i k l f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  16. MeCl Example(AB) A1 A2 A3 a b k l f g PA c h d e i j B1 B2 e d c a h i PB f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  17. MeCl Example(A1 B1) U A1,1 A1 A2 A3 a b a b k l c f g PA c h d e i j B1 B2 e d c a h i PB f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  18. MeCl Example(A2 B1) U A1,1 A2,1 A1 A2 A3 a b f g a b k l c f g PA c h d e i j B1 B2 e d c a h i PB f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  19. MeCl Example(A1 B2) U A1,1 A2,1 A1 A2 A3 a b f g a b k l c f g PA c h A1,2 d e i j d e B1 B2 e d c a h i PB f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  20. MeCl Example(A2 B2) U A1,1 A2,1 A1 A2 A3 a b f g a b k l c f g PA c h A1,2 A2,2 d e i j d e h B1 B2 e d c a h i PB f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  21. MeCl Example(A3 B2) U A1,1 A2,1 A1 A2 A3 a b f g a b k l c f g PA c h A1,2 A2,2 d e i j d e h B1 B2 e d c a h i j i PB f k A3,2 b k l j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  22. MeCl Example(AB) B1 A1,1 A2,1 A1 A2 A3 a b f g a b k l c f g PA c h A1,2 A2,2 d e i j d e h B1 B2 e d c a h i j i PB f k A3,2 b k l j g l B2 Reverse Engineering (Evaluation of Clustering Algorithms)

  23. MeCl Example (AB) B1 Newly Introduced Inter-Edges A1,1 A2,1 A1 A2 A3 a b f g a b k l c f g PA c h A1,2 A2,2 d e i j d e h B1 B2 e d c a h i j i PB f k A3,2 b k l j g l B2 Reverse Engineering (Evaluation of Clustering Algorithms)

  24. MeCl Example(BA) B1,1 B1,2 A1 A2 A3 a b f g a b k l c f g PA c B2,1 B2,2 h d e i j d e h B1 B2 e d c a h i j i PB f k b k l j B2,3 g l Reverse Engineering (Evaluation of Clustering Algorithms)

  25. MeCl Example(BA) A1 A2 B1,1 B1,2 A1 A2 A3 a b f g a b k l c f g PA c B2,1 B2,2 h d e i j d e h B1 B2 e d c a h i j i PB f k b k l j B2,3 g l A3 Reverse Engineering (Evaluation of Clustering Algorithms)

  26. MeCl Example (BA) A1 A2 Newly Introduced Inter-Edges B1,1 B1,2 A1 A2 A3 a b f g a b k l c f g PA c B2,1 B2,2 h d e i j d e h B1 B2 e d c a h i j i PB f k b k l j B2,3 g l A3 Reverse Engineering (Evaluation of Clustering Algorithms)

  27. MeCl Calculation Inter-Edges Introduced A1 A2 A3 MeCl(AB):({b,e},{e,c},{g,h},{f,h}) MeCl(BA):({e,i},{h,j},{b,f},{c,f},{h,e}) a b k l f g PA c h d e i j B1 B2 e d maxW(MAB, MBA) c MeCl=1- a Total Edge Weight h i PB f k b 5 j 1- = 73.7% MeCl = g l 19 Reverse Engineering (Evaluation of Clustering Algorithms)

  28. M1 M2 M3 M7 A2: M5 M6 M4 M8 M1 M2 M3 M7 M4 B2: M5 M6 M8 M1 M2 M3 M7 A1: M5 M6 M4 M8 M1 M2 M3 M7 M4 B1: M5 M6 M8 Similarity Measurement Recap P1 P2 MoJo(P1) = MoJo(P2) = 87.5% PR(P1) = PR(P2) = P:84.6%, R:68.7%, AVGPR=76.7% Conclusion… P1 is equally similar to P2 Reverse Engineering (Evaluation of Clustering Algorithms)

  29. M1 M2 M3 M7 A2: M5 M6 M4 M8 M1 M2 M3 M7 M4 B2: M5 M6 M8 M1 M2 M3 M7 A1: M5 M6 M4 M8 M1 M2 M3 M7 M4 B1: M5 M6 M8 Similarity Measurement Recap P1 P2 EdgeSim(P1)=77.8% EdgeSim(P2)=58.3% MeCl(P1)=88.9% MeCl(P2)=66.7% Conclusion… P1 is more similar than P2 Reverse Engineering (Evaluation of Clustering Algorithms)

  30. Summary:EdgeSim & MeCl • EdgeSim: • Rewards clustering algorithms for preserving the edge types • Penalizes clustering algorithms for changing the edge types • MeCl: • Rewards the clustering algorithm for creating cohesive “subclusters” Reverse Engineering (Evaluation of Clustering Algorithms)

  31. Special Modules Omnipresent Modules:“Strong” Connection to other Modules A1 A2 A3 a b k l f g PA c h Library Modules:Always used by other modules, never use other modules d e i j B1 B2 e d c a h Isomorphic Modules:Modules equally connected to other subsystems i PB f k b j g l Reverse Engineering (Evaluation of Clustering Algorithms)

  32. Special Modules Special Treatment of Special Modules helps to determine the Similarity A1 A2 A3 a b k l f g PA c h k Omnipresent Modules:Removed d i B1 B2 Library Modules:Removed k d c a h i Isomorphic Modules:Replicated PB f k b g l Reverse Engineering (Evaluation of Clustering Algorithms)

  33. Clustered Result M1 M3 M6 M2 M7 M8 M4 M5 Case Study Overview Similarity Evaluation Tool Similarity Analysis Source Code void main(){printf(“hello”);} Precision/Recall MoJo Clustering Algorithms EdgeSim MeCl Average, Variance, etc. based on 100 clustering runs…(4950 Evaluations) Reverse Engineering (Evaluation of Clustering Algorithms)

  34. Case Study Observations • All similarity measurements exhibit consistent behavior for the systems studied • Removal of “special” modules improved all similarity measurements • Treating isomorphic modules specially only improved similarity slightly • EdgeSim and MeCl produced higher and less variable similarity values then Precision/Recall and MoJo For all systems examined:If MeCl(SA) < MeCl(SB) then MoJo(SA) < MoJo(SB), PR(SA) < PR(SB), and EdgeSim(SA) < EdgeSim(SB) Reverse Engineering (Evaluation of Clustering Algorithms)

  35. Results: Bunch System PB PA PA versus PB PA PB Sample Size = 100 Reverse Engineering (Evaluation of Clustering Algorithms)

  36. Results: Swing System PB PA PA versus PB PA PB Sample Size = 100 Reverse Engineering (Evaluation of Clustering Algorithms)

  37. Results: RCS System PA SimMIN versus SimMAX PA Sample Size = 100 Reverse Engineering (Evaluation of Clustering Algorithms)

  38. Results: Dot System PA PA Sample Size = 100 Reverse Engineering (Evaluation of Clustering Algorithms)

  39. CRAFT: A Framework for Evaluating Software ClusteringResults in the Absence of Benchmark Decompositions Reverse Engineering (Evaluation of Clustering Algorithms)

  40. Reference Decompositions • A Reference Decomposition, (a.k.a Gold Standard, or Benchmark) is a “good” clustering result • Difficult to create a de facto reference decomposition • Our approach: If no benchmark is available, we create one based on commonalities in various clustering results Reverse Engineering (Evaluation of Clustering Algorithms)

  41. Why Reference Decompositions? • Assists Software Maintainers • Simplifies program comprehension • Provides an important architectural view • Provides a reference for comparing and evaluating the results of clustering tools Reverse Engineering (Evaluation of Clustering Algorithms)

  42. Obtaining a Reference Decomposition • Designer knowledge • Manual techniques • Automated Tooling • Source Code Analysis • Clustering Manual Approach Automatic Approach Reverse Engineering (Evaluation of Clustering Algorithms)

  43. Research Objectives • Develop an infrastructure to create reference decompositions automatically • Highlight common trends produced by different clustering algorithms • Show areas of strong, weak and marginal agreement between different clustering algorithms Reverse Engineering (Evaluation of Clustering Algorithms)

  44. CRAFT User Interface CRAFT Application Programming Interface (API) Clustering Services Data Analysis Clustering Results Repository Visualization Services Clustering Drivers Analysis Tools CRAFT Architecture CRAFT = Clustering Results Analysis Framework and Tools Provided Service User Developed Service Reverse Engineering (Evaluation of Clustering Algorithms)

  45. CRAFT Services Confidence Analysis Impact Analysis Reverse Engineering (Evaluation of Clustering Algorithms)

  46. Building the Repository Clustering Results Repository CRAFTFramework [M1, M2, Frequency] Persists Loads • The clustering driver bean is loaded by CRAFT • The bean uses various algorithms to cluster the desired system many times • The CRAFT Framework includes 2 clustering drivers that use Bunch Clustering Driver (Java Bean) Executes Clustering Algorithms Reverse Engineering (Evaluation of Clustering Algorithms)

  47. Confidence Analysis Used to create a benchmark decomposition MDG Repository “Benchmark” [M1, M2, 100] [M2, M4, 100] [M2, M5, 100] [M1, M6, 100] [M7, M8, 100] [M1, M3, 57] [M3, M8, 43] [M1, M2, 100] [M2, M4, 100] [M2, M5, 100] [M1, M6, 100] [M7, M8, 100] [M1, M3, 57] [M3, M8, 43] M6 M1 M1 M7 M3 M2 M6 M2 M8 M4 M5 M4 M5 T = 75% M7 M3 User-supplied threshold determines the grouping criteria M8 Reverse Engineering (Evaluation of Clustering Algorithms)

  48. M1 M6 M3 M1 M2 M8 M7 M4 M5 M4 Validating the Benchmark Decomposition “Benchmark” Assessment Good Fair Poor Repository M1 M6 M3 Comparator (using Similarity Measurements) M1 M2 M8 M7 Query M4 M5 M4 M6 M1 M3 M8 M7 M2 M4 M5 Generated “Benchmark” Reverse Engineering (Evaluation of Clustering Algorithms)

  49. M1 M4 M5 M6 M3 M7 M8 Impact Analysis Used to understand the impact of a structural change MDG Repository “Impact on M2” [M1, M2, 100] [M2, M4, 100] [M2, M5, 100] [M1, M6, 100] [M7, M8, 100] [M1, M3, 57] [M3, M8, 43] M6 M2 M1 M7 M3 M2 M8 M4 M5 TL = 15% TH = 75% User-supplied threshold determines the grouping criteria Reverse Engineering (Evaluation of Clustering Algorithms)

  50. Removing “Special” Modules • Certain modules influence clustering algorithms too much • Library Modules • Omnipresent Modules Reverse Engineering (Evaluation of Clustering Algorithms)

More Related