Scalable Clone Detection and Elimination for Erlang Programs - PowerPoint PPT Presentation

scalable clone detection and elimination for erlang programs n.
Skip this Video
Loading SlideShow in 5 Seconds..
Scalable Clone Detection and Elimination for Erlang Programs PowerPoint Presentation
Download Presentation
Scalable Clone Detection and Elimination for Erlang Programs

play fullscreen
1 / 34
Download Presentation
Scalable Clone Detection and Elimination for Erlang Programs
Download Presentation

Scalable Clone Detection and Elimination for Erlang Programs

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK

  2. Overview • Erlang • Wrangler • Clone detection • Clone elimination • Case studies • Conclusions and future work

  3. Erlang • Weakly typed functional programming language. • Built-in support for concurrency, distribution and fault-tolerance. • Some eccentricities: multiple binding occurrences, bound variables in patterns, multiple usages of atoms, side-effects, .... %% Factorial in Erlang. -module (fac). -export ([fac/1]). fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1).

  4. Wrangler Clone detection+ removal Improve modulestructure Basic refactorings: structural, macro, process and test-framework related

  5. Clone Detection

  6. Clone Detection • The Wrangler clone detector • Report clone classes whose members are identical or similar • No false positives • High recall rate • Scalable.

  7. What is ‘identical’ code? X+4 Y+5 X+4 Y+5 variable+number Identical if values of literals and variables ignored, but respecting binding structure.

  8. What is ‘similar’ code? (X+3)+4 4+(5-(3*X)) X+Y The anti-unification gives the (most specific) common generalisation. Similarity = min( , , ) ||X+Y|| ||X+Y|| ||(X+3)+4|| ||4+(5-(3*X))||

  9. Clone Detection • All clones in a project meeting the threshold parameters. • Thresholds: • minimum number of expressions, • minimum number of tokens, • minimum number of duplications, • maximum number of new parameters, and • minimum similarity score.

  10. Clone result with threshold values: 1, 40, 2, 4, 0.8:

  11. Clone result with threshold values: 3, 20, 2, 2,0.8:

  12. Implementation

  13. Implementation • Clone detection in an incremental way. • Initial clone detection. • Incremental clone detection. • AST-based two-phase clone detection.

  14. The Initial Detection Algorithm Source Erlang programs • Bypasses the Erlang pre-processor; • Location information included In AST; • Static semantic information added to AST • AAST traversed, and expression sequences collected. Parse program, annotate and serialise AST • Capture structural similarity between expressions while keeping a structural skeleton of the original; • Replace certain substrees with a placeholder, but only if sensible to do so. • Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers. Serialised AAST Generalise and hash expression Hashed expression sequences • Check a candidate clone class for anti-unification, and will return none, one or more clone classes; • Generation of anti_unifier function; • Generation of application instances. Clone detection using generalised suffix tree Initial clone candidates Examination of clone candidates using anti-unification Final clones

  15. The Initial Detection Algorithm • Designed with incremental clone detection in mind. • Use relative locations, every function starts from location {1, 1}; • Intermediate information cached: AAST, Static semantic information, hash information, clone table.

  16. The Incremental Detection Algorithm • Follow the same steps as the initial detection algorithm, but reuse and incrementally update the information cached from the previous run of the clone detection. • Take a function, instead of a file, as a unit to track changes. • Track the change of clones, mark each clone class as new, unchanged, change+, changed-, or change+- .

  17. Clone Elimination • Fully automatic clone elimination not desirable in practice. • Choice of clones to remove. • functionality of the clone needs to be examined. • the anti-unification function of a clone class, and its parameters need to be renamed. • A host module for the anti-unification function needs to be selected.

  18. Clone Elimination with Wrangler • Copy and paste the anti_unification function to an proper Erlang module. • Modify the anti_unification function is necessary. • Rename function name. • Rename variable names. • Re-order function parameters. • Apply ‘fold expressions against a function definition’ to the new function.

  19. Case Study 1

  20. Incremental vs. Standalone Clone Detection

  21. Case Study 2

  22. SIP case study Session Initiation Protocol SIP message processing allows rewriting rules to transform messages. SIP message manipulation (SMM) is tested by smm_SUITE.erl, 2658 LOC.

  23. Clone detection

  24. Clone detection

  25. Reducing the case study

  26. Case Study 3

  27. Conclusions • Efficient clone detection on medium-sized projects. • Possible to improve code using these techniques, but only with expert involvement. • A mechanism for clone detection to contribute to the daily reports from incremental nightly builds; case-study for this with LambdaStream.

  28. Future Work • To extend the tool to detect expression sequences which are similar up to insertion, or deletion of some expressions. • To check client code against libraries.

  29. Thank you!