780 likes | 882 Views
This study by Marc Eaddy delves into the challenges of crosscutting concerns in software maintenance and development. The research offers insights into concern location, impact assessment, and defect analysis to enhance software modularity and evolution.
E N D
An Empirical Assessment of the Crosscutting Concern Problem Marc Eaddy Department of Computer Science Columbia University
Motivation • Maintenance dominates software costs Other Development 50–90% of total software cost 3–4 x Development Costs Maintenance
Motivation • >50% of maintenance time spent understanding the program
Motivation • >50% of maintenance time spent understanding the program • Where are the features,reqs, etc. in the code? Reqs Code
Motivation • >50% of maintenance time spent understanding the program • Where are the features,reqs, etc. in the code? • What is this code for?
Motivation • >50% of maintenance time spent understanding the program • Where are the features,reqs, etc. in the code? • What is this code for? • Why is it hard to understand and changethe program?
Main Contributions ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
Improving concern location • Statement Annotations for Fine-Grained Advising • ECOOP Workshop on Reflection, AOP, and Meta-Data for Software Evolution (2006) • Eaddy and Aho • Demo: Wicca 2.0 - Dynamic Weaving using the .NET 2.0 Debugging APIs • Aspect-Oriented Software Development (2007) • Eaddy • Identifying, Assigning, and Quantifying Crosscutting Concerns • ICSE Workshop on Assessment of Contemporary Modularization Techniques (2007) • Eaddy, Aho, and Murphy • Cerberus: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis • IEEE International Conference on Program Comprehension (2008) • Eaddy, Aho, Antoniol, and Guéhéneuc
Innovative metrics & methodology • Towards Assessing the Impact of Crosscutting Concerns on Modularity • AOSD Workshop on Assessment of Aspect Techniques (2007) • Eaddy and Aho • Do Crosscutting Concerns Cause Defects? • IEEE Transactions on Software Engineering (2008) • Eaddy, Zimmerman, Sherwood, Garg, Murphy, Nagappan, and Aho
Dangers of crosscutting • Do Crosscutting Concerns Cause Defects? • IEEE Transactions on Software Engineering (2008) • Eaddy, Zimmerman, Sherwood, Garg, Murphy, Nagappan, and Aho
Roadmap ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
What is a “concern?” Anything that affects the implementation of a program • Feature, requirement, design pattern, code idiom, etc. • Raison d'être for code • Every line of code exists to satisfy some concern • Existing definitions are poor • Concern domain must be “well-defined set”
Concern location problem • Concern–code relationship hard to obtain Program Elements Concerns
Concern location problem • Concern–code relationship hard to obtain • Concern–code relationship undocumented Program Elements Concerns ?
Concern location problem • Concern–code relationship hard to obtain • Concern–code relationship undocumented • Reverse engineer the relationship Program Elements Concerns
Manual concern location • Concern–code relationship determined by a human • Existing techniques too subjective • Inaccurate, unreliable • Ideal • Code affected when concern is changed • My insight • Prune dependency rule [ACOM’07] • Code affected when concern is pruned (removed) • i.e., software pruning • Practical approximation
Prune dependency rule • Code is prune dependenton concern if • Concern pruned code removed or altered • Distinguish between removing and altering code • Easily determine change impact of removing code • Code dependent on removed code must be altered (to prevent compile errors) • Easy for human to approximate
Manual concern location • Concern–code relationship determined by a human • Existing tools impractical for analyzing all concerns of a real system • Many concerns (>100) • Many concern–code links (>10K) • Hierarchical concerns • My solution: ConcernTagger [TSE’08]
Automated concern location • Concern–code relationship predicted by an “expert” • Experts look for clues in docs and code • Existing techniques only consult 1 or 2 experts • My solution: Cerberus [ICPC’08] • Information retrieval • Execution tracing • Prune dependency analysis
IR-based concern location • i.e., Google for code • Program entities are documents • Requirements are queries Requirement “Array.join” SourceCode join Id_join js_join()
Vector space model [Salton] • Parse code and reqs doc to extract term vectors • NativeArray.js_join()method “native,” “array,” “join” • “Array.join”requirement “array,” “join” • My contributions • Expand abbreviations • numconns number, connections, numberconnections • Index fields • Weigh terms (tf · idf) • Term frequency (tf) • Inverse document frequency (idf) • Similarity = cosine distance between document and query vectors
Tracing-based concern location • Observe elements activated when concern is exercised • Unit tests for each concern • e.g., find elements uniquely activated by a concern
Tracing-based concern location • Observe elements activated when concern is exercised • Unit tests for each concern • e.g., find elements uniquely activated by a concern Unit Test for “Array.join” Call Graph var a = new Array(1, 2); if (a.join(',') == "1,2"){ print "Test passed"; } else { print "Test failed"; } js_join js_construct
Tracing-based concern location • Observe elements activated when concern is exercised • Unit tests for each concern • e.g., find elements uniquely activated by a concern Unit Test for “Array.join” Call Graph var a = new Array(1, 2); if (a.join(',') == "1,2"){ print "Test passed"; } else { print "Test failed"; } js_join js_construct
Tracing-based concern location • Elements often activated by multiple concerns • What is “information content” of element activation? • Element Frequency–Inverse ConcernFrequency [ICPC’08]
Prune dependency analysis • Infer relevant elements based on structural relationship to relevant element e (seed) • Assumes we already have some seeds • Prune dependency analysis[ICPC’08] • Automates prune dependency rule[ACOM’07] • Find references to e • Find superclasses and subclasses of e
PDA example Program Dependency Graph Source Code inherits interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
PDA example Program Dependency Graph Source Code inherits interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
PDA example Program Dependency Graph Source Code inherits interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
PDA example Program Dependency Graph Source Code inherits interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
PDA example inherits Program Dependency Graph Source Code interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
Cerberus effectiveness Cerberus Cerberus Most effective PDA improves IR by 155% PDA Improves Tracing by 104%
Roadmap ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
The crosscutting concern problem Some concerns difficult to modularize • Code related to the concern is… • Scattered across (crosscuts) multiple files • Often tangled with other concern code Program Elements Concerns
Example: Pathfinding in Goblin • Pathfinding is modularized
Example: Collision detection • Collision detection not modularized
How to measure scattering? • Existing metrics inadequate • My solution • Degree of scattering [ASAT’07] • Degree of tangling [ASAT’07]
Degree of scattering (DOS) • Measures concern modularity, i.e., distribution of concern code across multiple classes • Average DOS – Overall modularity of concerns • Summarizes amount of crosscutting present • More insightful than traditional metrics • “class A is highly coupled” vs. “feature A is hard to change” [Wong, et al.] [ACOM’07]
DOS= 1.00 #Classes = 4 DOS= 0.08 #Classes = 4 Insight behind DOS • More descriptive than class count • Consider two different concern implementations Marc Eaddy
Degree of tangling (DOT) • Distribution of classcode across multiple concerns • Average DOT – Overall separation of concerns [Wong, et al.] [ACOM’07] Marc Eaddy
Roadmap ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
Do crosscutting concerns cause defects? [TSE’08] • Created mappings • Requirement–code map (via ConcernTagger) • Bug–code map (via BugTagger) • Bug–requirement map (inferred)
Do crosscutting concerns cause defects? [TSE’08] • Correlated scatteringand bug count • Spearmancorrelation • Found moderateto strong correlationbetween scatteringand defects • As scattering increasesso do defects Scattering Bugs
How widespread is the problem? • 5 case studies of OO programs • Scattering • Concerns related to 6 classes on average • OO unsuitable for representing these problem domains? • Most (86%) concerns are crosscutting to some extent • Dispels “modular base” notion • General-purpose solution needed • Tangling • Classes related to 10 concerns on average • Poor separation of concerns • Classes doing too much • Crosscutting concerns severely limit modularity
Main Contributions ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
Future work • Further explore new concern analysis field • Techniques to reduce crosscutting • Improve concern location • Improve PDA generality, precision, and heuristics • Use machine learning to combine judgments • Incorporate smart “grep” and PDA into IDE • Gather empirical evidence • Impact of reducing crosscutting • Impact of crosscutting on maintenance effort • Impact of code tangling on quality
Acknowledgements • Alfred Aho • ConcernTagger/Mapper • Vibhav Garg • Jason Scherer • John Gallagher • Martin Robillard • FrédéricWeigand-Warr • BugTagger • Thomas Zimmermann • Cerberus • Giuliano Antoniol • Yann-Gaël Guéhéneuc • Andrew Howard • Gobin • Erik Petterson • John Waugh • Hrvoje Benko • Wicca • BorianaDitcheva • Rajesh Ramakrishnan • Adam Vartanian • Microsoft Phoenix Team
Questions? Marc Eaddy Columbia University eaddy@cs.columbia.edu