Issues of consistency in defining slices for slicing metrics: ensuring comparability in research findings

Issues of consistency in defining slices for slicing metrics: ensuring comparability in research findings Tracy Hall, Brunel University David Bowes, University of Hertfordshire Andrew Kerr, University of Hertfordshire

Schedule • Why are we interested in replicating slices? • What are slice-based coupling and cohesion metrics? • What did Meyers & Binkley do in their study? • What did we do in our replication of M&B’s study? • How do our results compare to M&B’s? • Do slice results matter? • What are the implications of our findings?

Why are we interested in replicating slices? • Aimed to investigate whether sliced-based metrics can predict fault-prone code. • We needed to validate that we were collecting slice-based metrics data correctly. • Tried to identically re-produce Meyers and Binkley’s (2004, 2007) metrics values • Our replication highlights many ways in which the identification of program slices can vary. • Our results identify a need for consistency and/or full specification of slicing variables.

What are slice-based metrics? • Original set of cohesion metrics proposed by Weiser in 1981 and extended by Ott et al in the 1990’s • Harman et al. (1997) introduced slice-based coupling. • Green et al (2009) present a detailed overview showing the evolution of slice-based coupling and cohesion metrics.

Slice-based coupling metrics Meyers and Binkley (2007, p.8), use Harman et al.’s (1997) definition of coupling to define the coupling of a function f to be a weighted average of its coupling to all other functions in the program:

Slice-based cohesion metrics Cohesion metric definition (Ott & Thuss, 1993)

What did Meyers & Binkley do in their study? • Meyers and Binkley (2004, 2007) first to collect and analyse large scale slice-based metrics data • Collected slice-based metrics data on 63 open source C projects. • Produced a longitudinal study showing the evolution of coupling and cohesion over many releases of Barcode and Gnugo projects • Used CodeSurfer to slice • Wrote scripts to collect slice-based metrics data

Fermats Last Theorem The problem in replicating studies Insufficient space in a published paper to describe the methods to allow for replication…. “I have discovered a truly marvelous proof that it is impossible to separate a cube into two cubes, or a fourth power into two fourth powers, or in general, any power higher than the second into two like powers. This margin is too narrow to contain it.” (1637) Replicated Wiles A (1995)

What did we do in our replication? • Replicated only M&B’s longitudinal results for the evolution of cohesion in Barcode • Barcode has 65 functions & 49 releases • The highest preset build option was used on CodeSurfer • We tried to replicate the method reported by M&B. • We discussed with Dave Binkley methodological issues that were unclear. • We wrote our own Scheme scripts (and were provided with scripts from CREST (Youssef))

Longitudinal cohesion Barcode - M&B Results Barcode - Our results

Longitudinal cohesion Barcode - M&B Results Barcode – Our results (full vertex removal)

Trying to understand where we were going wrong… Looked in detail at one data point (release 0.98) Tried to examine all variations in the way that this data point could be calculated. We sliced both on files and on projects We varied the way lines of code are included in slices using: Formal Ins: Input parameters for the function specified in the module declaration. Formal Outs: Return variables. Globals: Variables used by or affected by the module. Printf: Variables which appear as Formal Outs in the list of parameters in an output statement. (based on the variations reported in previous studies analysed by Green et al 2009)

Combinations of slicing settings tested Not possible NB all these settings were sliced both on a file and project basis

Average module metrics for different combinations of variables Meyers & Binkley results: O=0.51 T=0.26 cov=0.54 min=0.30 max=0.71 I = Formal Ins, O = Formal Out, G = Globals, pF=printf; NB: Both forward and backward slices were used in all cases.

What issues impact on slice-based data? • Only use pdgs which are 'user-defined‘ and remove pdgs with zero vertices • Keep globals identified n times? • String constants considered as output variables (?) • Slices are based on both data and control edges • Slices of length zero are removed (would have a significant impact on tightness) • Intersect all slices with the pdg vertices to remove vertices found outside of the pdg • Remove vertex indices with an identifier <1 • Remove vertices associated with body '{' and '}' • Declaration vertices removed as not consistently included with forward and back slices • Return has auto generated value so if a variable is output via a global or written as well as returned the script may catch the same (source code) variable twice. • Global outputs from a function f include globals modified transitively by calls from f ("outgoing variables"), resulting in numerous slices. • Selection of actual inputs to output functions is naïve; sometimes we may want format string in printf statements • Dealing with placeholder functions: if they have size zero after vertices are pruned they are ignored • Should only some types of variables not be included in slicing criteria, e.g. string type? • Should forward slices use may-kill or declaration vertices? Time for variant performance analysis? Slide 19

What are the implications of our findings? • For slice-based metrics: • Specifying precisely all parameters of a slice and a metric is important but difficult. • Identifying the ‘best’ variant of a metric may be useful. • For replicating studies: • Studies need to publish basic information that allows replication • For Software Engineering • We need to build bodies of evidence and this must include replicated studies.

References • Green, P., Lane, P., Rainer, A., Scholz, S.-B. (2009). An Introduction to Slice-Based Cohesion and Coupling Metrics. Technical Report No. 488, University of Hertfordshire, School of Computer Science. • Harman, M., Okunlawon, M., Sivagurunathan, B., Danicic, S. (1997). Slice-Based Measurement of Coupling. IEEE/ACM ICSE workshop on Process Modelling and Empirical Studies of Software Evolution, (pp. 28-32). Boston, Massachusetts. • Meyers, T. M., Binkley, D. (2004) A Longitudinal and Comparative Study of Slice-Based Metrics. International Software Metrics Symposium, Chicargo, USA, IEEE Procs • Meyers, T. M., Binkley, D. (2007). An Empirical Study of Slice-Based Cohesion and Coupling Metrics. ACM Transactions on Software Maintenance, 17(1), pp. 1-25. • Ott, L. M., &Thuss, J. J. (1993). Slice Based Metrics for Estimating Cohesion. In Proceedings of Internationl Software Metrics Symposium, Proceedings of the IEEE-CS, 71—81

Any questions? Tracy Hall Reader in Software Engineering Brunel University Uxbridge, UK tracy.hall@brunel.ac.uk David Bowes Senior Lecturer in Computing University of Hertfordshire Hatfield, UK d.h.bowes@herts.ac.uk

The impact of slice variants • Some variants have a better relationship with fault-prone code than other varients…

Normalised Hamming Distance • Another Cohesion metric: • Proposed by Counsel et al 2006Adapted for program slices : l= number of slices k = number of vertices in the module c = is the number of vertices for the slice based on <variable, locus>j

Issues of consistency in defining slices for slicing metrics: ensuring comparability in research findings