Co-Slicing for Program Comprehension and Reuse March 2008

Co-Slicing for Program Comprehension and ReuseMarch 2008 Ran Ettinger Software Asset Management Group In Advanced SW Tools Seminar, TAU

Agenda • Introduction to Program Slicing • Debugging aid, program comprehension tool, and much more • PAINLESS demo • Co-Slicing for Reuse: Program Sliding • A provably-correct code-motion untangling transformation of slice extraction • Co-Slicing for Program Comprehension • Problem of large slices • Novel solution: Interactive program exploration with the slice-inclusion relation and co-slicing • A Co-Slicing Algorithm • Back to Sliding • Related Work • CodeSurfer’s single-step slice browsing, thin slicing, Dijkstra’s projections, and method-extraction algorithms • Further Challenges

Introduction to Program Slicing • Slicing is the study of meaningful subprograms • “When debugging unfamiliar programs programmers use program pieces called slices which are sets of statements related by their flow of data. The statements in a slice are not necessarily textually contiguous, but may be scattered through a program” [Mark Weiser,CACM82 ] • Given a program and a variable (at a point) of interest, a slice of the program on that variable is a subprogram that preserves the original behavior, with respect to that variable • Demo 1: Slicing HLASM code (Program Analysis INfrastructure for Legacy Enterprise Software Systems project [PAINLESS] at IBM HRL) • A wide variety of potential applications • Debugging, program comprehension, testing, refactoring, componentization, parallelization, and more

Co-Slicing for Reuse: Program SlidingExtract the computation of profit i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; while (i < days) { totalSale = totalSale+sale[i]; i = i+1; } profit = 0.9*totalSale-cost; } if (shouldProcess) { i = 0; totalPay = 0; while (i < days) { totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; } i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; while (i < days) { totalSale = totalSale+sale[i]; i = i+1; } profit = 0.9*totalSale-cost; } The extracted slice if (shouldProcess) { i = 0; totalPay = 0; while (i < days) { totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; } The complement (no unnecessary duplication of the loop for reading sales; hence, no need to reject!) Example source: Lakhotia and Deprez [IST98]

Co-Slicing for Reuse: Program Sliding • A provably-correct code-motion untangling transformation of slice extraction • My doctoral thesis “Refactoring via Program Slicing and Sliding” [Ett] • Automated slice extraction • Combines statement reordering with code duplication • A sequential composition of a slice with its complement (i.e. co-slice) • Adding some compensatory code, for correctness • Enables automation of advanced refactorings • Split/Merge Loops • Separate Query from Modifier • Command/Query separation • Arbitrary Method Extraction • Advanced versions of Extract Method • Sliding thesis [Ett] and Raghavan Komondoor’s thesis [Kom] • Replace Temp with Query • By slice extraction • Demo 2: Nate, an Eclipse plugin, prototype slice-extraction refactoring for a small subset of Java, developed by Mathieu Verbaere and myself at Oxford, 2003/2004, supported by an Eclipse Innovation Grant by IBM

Co-Slicing for Program Comprehension: Problem of Large Slices • Slices (especially from the end) tend to grow too large to be effective • Why is the typical end-slice so large? • The slice must produce correct values • It hence includes all statements that may contribute to the value of any used variable, at any point in the slice (i.e., it follows all data-flow dependences) • Demo 3: indirect (i.e. base register) data dependence • The slice must be executable • It hence includes all statements with conditions and jumps, if those control whether to execute (or not) any other statement in the slice • Demo 4: following control dependences

Problem of Large Slices: End-Slice of Variable pay i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay totalPay sale i cin

Novel Solution: A Slice-Inclusion Relation and Co-Slicing • Interactive program exploration • Guided by a slice-inclusion diagram • First introduced by Gallagher and Lyle [ToSE91] • For software maintenance, supporting a process of change, avoiding the need for regression testing • The diagram is a directed graph, representing a given program (or subprogram) S, and including: • A node for each (defined) variable x • Stands for the slice of S on x, from the end • A directed edge from x to y whenever • The slice of x is fully included in that of y, and • There is no other variable z whose slice both includes the slice of x and is included in the slice of y

Example: Slice-Inclusion Diagram i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

Interactive Program Exploration:An On-Demand Approach i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

On-Demand Exploration i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

On-Demand Exploration: Back to End-Slice of profit…the problem of large slices is not yet solved i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

Interactive Program Exploration withSlices and Co-Slices: A Bottom-Up Approach i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

Bottom-Up Exploration with Slices and Co-Slices i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

What’s in a Co-Slice then?Is it the complementary set of statements? [too small!] i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

What’s in a Co-Slice then?Is it the union of slices of all remaining variables? [too large!] i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

What’s in a Co-Slice then?Assume results of selected variables are available and reuse them i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

Illustration of a Co-Slicing algorithm (1):Assume results of selected variables are available and reuse them i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+fSale[i]; totalPay = totalPay+0.1*fSale[i]; if (fSale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = fTotalPay/days+100; profit = 0.9*fTotalSale-cost; } pay profit totalPay totalSale sale i cin

Illustration of a Co-Slicing algorithm (2):Slice now… and get a smaller slice i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+fSale[i]; totalPay = totalPay+0.1*fSale[i]; if (fSale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = fTotalPay/days+100; profit = 0.9*fTotalSale-cost; } pay profit totalPay totalSale sale i cin

Illustration of a Co-Slicing algorithm (3):Undo the variable renaming… wherever possible i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+fSale[i]; totalPay = totalPay+0.1*fSale[i]; if (fSale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = fTotalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

Back to Sliding: Separate non-maximal from maximal i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } } if (shouldProcess) { pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin

Another Sliding Example: Separate profit and all included variables i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; while (i < days) { totalSale = totalSale+sale[i]; i = i+1; } profit = 0.9*totalSale-cost; } if (shouldProcess) { i = 0; totalPay = 0; while (i < days) { totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; } pay profit totalPay totalSale sale i cin

The Promise of Sliding • Mainly good for: • Enhancing reusability of tangled (non-contiguous) existing code • Refactoring (e.g. Replace Temp with Query) • Componentization • Parallelization • Particularly strong in: • Correctness, i.e. behavior preservation • Maximizing reuse (of extracted computation’s results, in the complement) • Minimizing code duplication, i.e. yielding a small complement • Minimizing the necessary compensation, i.e. less backup variables • Improving applicability, i.e. less reasons to reject a request

Related Work: Enhance Reuse by Method Extraction • Slice extraction • Tucking by Lakhotia and Deprez [IST98] • Complement is union of slices from all non-extracted points • No data flow from slice to complement • Block-based slicing by Maruyama [SSR01] • A rudimentary approach (no proof of correctness) • Untangling: A Slice Extraction Refactoring [AOSD04] • Arbitrary method extraction: Extract any selection of (not-necessarily contiguous) code • Tucking [IST98] • Procedure extraction by Komondoor and Horwitz [POPL00,IWPC03,Kom] • Allows data flow from extracted code to the complement • Inspired invention of co-slices • However, does not support duplication of assignments • Hence, no untangling of loops; instead, may extract more code than actually selected

Related Work: Program Comprehension • Thin slicing (by Sridharan, Fink and Bodik [PLDI07]) • Focus on direct (value, not pointer) data dependences • Ignore control dependences • Ignore data dependences carrying pointers (base and index registers, in the context of HLASM) • A thin slice can be expanded with other thin slices • Yielding the full traditional slice, in the limit • CodeSurfer’s slice browsing [CodeSurfer] • One step at a time, jumping forward or backward in a slice, following data or control dependences • Dijkstra’s projections (in his Smoothsort article [SoCP82]) • Explaining an algorithm stepwise, one variable/projection at a time

Some Further Challenges • Implement the co-slicing and sliding algorithms • Extend to “real” languages • Collect empirical results on length and usefulness of co-slices • Extend the slice-inclusion diagram to slices from internal program points • Apply sliding to more refactorings (e.g. “Separate Query from Modifier” [Fow], arbitrary method extraction) • Apply the sliding-related refactorings in bigger reengineering challenges (e.g. Convert Procedural Design to Objects [Fow], componentization, conversion to SOA) • Sliding beyond refactoring (e.g. in optimizing compilers, code obfuscation)

Thanks!

References • [CACM82] Programmers use slices when debugging, M. Weiser, 1982 • [SoCP82] Smoothsort, an Alternative for Sorting In Situ, E. W. Dijkstra, 1982 • [ToSE91] Using Program Slicing in Software Maintenance, Gallagher and Lyle, 1991 • [IST98] Restructuring programs by tucking statements into functions, A. Lakhotia and J.-C. Deprez, 1998 • [FOW] Refactoring: Improving the Design of Existing Code, M. Fowler, 2000 • [POPL00] Semantics-preserving procedure extraction, R. Komondoor and S. Horwitz, 2000 • [SSR01] Automated method-extraction refactoring by using block-based slicing, K. Maruyama, 2001 • [IWPC03] Effective automatic procedure extraction, R. Komondoor and S. Horwitz, 2003 • [Kom] Automated Duplicated-Code Detection and Procedure Extraction, R. Komondoor, PhD thesis, University of Wisconsin-Madison, 2003 • [AOSD04] Untangling: a slice extraction refactoring, R. Ettinger and M. Verbaere, 2004 • [Ett] Refactoring via Program Slicing and Sliding, R. Ettinger, DPhil thesis, 2006 • http://progtools.comlab.ox.ac.uk/members/rani/sliding_thesis.pdf • [PLDI07] Thin Slicing, M. Sridharan, S. J. Fink, R. Bodik, 2007 • [CodeSurfer] CodeSurfer from GrammaTech • http://www.grammatech.com/products/codesurfer/ • [PAINLESS] The Program Analysis INfrastructure for Legacy Enterprise Software Systems project • http://www.haifa.il.ibm.com/projects/services/painless

Backup

A Definition of (Slices and) Co-Slices • Definition of a slice: • Let S be a given statement and let V be a set of variables of interest. • A statement S’ is a slice of S on V, if for any input on which S terminates, S’ will terminate too, and with the same result held in all program variables V. • In a similar manner, the novel concept of a co-slice can be defined as follows: • Let S be a given statement and let V be a set of variables of NO interest. • That is, the final value of each variable in V, and the code for computing it, in S, can be removed -- if not contributing to any other result. • A statement S’ is a co-slice of S on V, if for any input on which S terminates, S’ will terminate too, and with the same result held in all program variables outsideV. • Moreover, suppose the result of V is available for reuse through the corresponding set of fresh variables fV. A co-slice S’ on V with fV is free to use the final value of variables in V through the corresponding elements of fV (or even directly from V, if only such final value references are present in S’).

A Co-Slicing Algorithm: Rationale • The goal of the algorithm is to maximize reuse of the available final values before slicing for the complementary set of variables • However, a simplistic approach of substituting all uses of co-sliced variables will not do • Some of the uses make reference to intermediate (i.e. non-final) values • The final-value references must be identified and substituted, ahead of slicing • Finally, after slicing, some substitutions may be undone

Final-Use Substitution • A final use of a variable x is a reference to x in a program point p, in which x is guaranteed to hold its final value • That is, no path from p to the exit, in terms of control flow, includes a definition of x • Or, equivalently, an assertion of the form assert x == fx, where fx is a fresh variable, can be correctly propagated backwards (against the flow of control) from the exit to the program point p • A definition of final-use substitution: • Given a program statement S, a set of variables X, and a corresponding set of fresh variables fX, the final-use substitution of X with fX, on statement S, yields a new statement S’ by replacing all final-use references of each member of X with a reference to the corresponding member of fX

A Co-Slicing Algorithm • Given a statement S, a set of variables of no-interest V, and a corresponding set of final values fV, compute the co-slice of S on V with fV as follows: • Reuse fV wherever possible • Let S’ be S with final-use substitution of V by fV • Slice for all remaining variables • Determine the complementary set of variables, coV, as all possibly-modified variables in S that are not in V • Let S’’ be the slice of S’ on coV • Undo the earlier substitutions wherever possible • Let V1 be the set of all variables in V that are not referenced (i.e. neither used nor defined) in S’’ • Let fV1 be the subset of fV corresponding to the subset V1 of V • Let S’’’ be the statement S’’ after normal substitution of all variables fV1 with the corresponding program variables V1 • Return S’’’ as the co-slice of S on V with fV

Co-Slicing for Program Comprehension and Reuse March 2008

Co-Slicing for Program Comprehension and Reuse March 2008

Presentation Transcript

SwE 455 Program Slicing

Survey of program slicing techniques

Program Slicing

A survey of techniques for precise program slicing

Program Slicing

Brownfield Reuse Program

Program Slicing

Program Slicing and Debugging

Program Comprehension

Program Slicing for Refactoring

Program slicing Techniques

Program Slicing: Theory and Practice

Kitsap County’s Reuse Program

HHW Reuse Program

Program Slicing – Based Techniques

Program Slicing

Debugging Support for Aspect-Oriented Program Using Program Slicing and Call Graph

CASE/Re-factoring and program slicing

New Results in Program Slicing

Program Slicing

CASE/Re-factoring and program slicing