1 / 34

Scalable Clone Detection and Elimination for Erlang Programs

Scalable Clone Detection and Elimination for Erlang Programs. Huiqing Li, Simon Thompson University of Kent Canterbury, UK. Overview. Erlang Wrangler Clone detection Clone elimination Case studies Conclusions and future work. Erlang. Weakly typed functional programming language.

luann
Download Presentation

Scalable Clone Detection and Elimination for Erlang Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK

  2. Overview • Erlang • Wrangler • Clone detection • Clone elimination • Case studies • Conclusions and future work

  3. Erlang • Weakly typed functional programming language. • Built-in support for concurrency, distribution and fault-tolerance. • Some eccentricities: multiple binding occurrences, bound variables in patterns, multiple usages of atoms, side-effects, .... %% Factorial in Erlang. -module (fac). -export ([fac/1]). fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1).

  4. Wrangler Clone detection+ removal Improve modulestructure Basic refactorings: structural, macro, process and test-framework related

  5. Clone Detection

  6. Clone Detection • The Wrangler clone detector • Report clone classes whose members are identical or similar • No false positives • High recall rate • Scalable.

  7. What is ‘identical’ code? X+4 Y+5 X+4 Y+5 variable+number Identical if values of literals and variables ignored, but respecting binding structure.

  8. What is ‘similar’ code? (X+3)+4 4+(5-(3*X)) X+Y The anti-unification gives the (most specific) common generalisation. Similarity = min( , , ) ||X+Y|| ||X+Y|| ||(X+3)+4|| ||4+(5-(3*X))||

  9. Clone Detection • All clones in a project meeting the threshold parameters. • Thresholds: • minimum number of expressions, • minimum number of tokens, • minimum number of duplications, • maximum number of new parameters, and • minimum similarity score.

  10. Clone result with threshold values: 1, 40, 2, 4, 0.8:

  11. Clone result with threshold values: 3, 20, 2, 2,0.8:

  12. Implementation

  13. Implementation • Clone detection in an incremental way. • Initial clone detection. • Incremental clone detection. • AST-based two-phase clone detection.

  14. The Initial Detection Algorithm Source Erlang programs • Bypasses the Erlang pre-processor; • Location information included In AST; • Static semantic information added to AST • AAST traversed, and expression sequences collected. Parse program, annotate and serialise AST • Capture structural similarity between expressions while keeping a structural skeleton of the original; • Replace certain substrees with a placeholder, but only if sensible to do so. • Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers. Serialised AAST Generalise and hash expression Hashed expression sequences • Check a candidate clone class for anti-unification, and will return none, one or more clone classes; • Generation of anti_unifier function; • Generation of application instances. Clone detection using generalised suffix tree Initial clone candidates Examination of clone candidates using anti-unification Final clones

  15. The Initial Detection Algorithm • Designed with incremental clone detection in mind. • Use relative locations, every function starts from location {1, 1}; • Intermediate information cached: AAST, Static semantic information, hash information, clone table.

  16. The Incremental Detection Algorithm • Follow the same steps as the initial detection algorithm, but reuse and incrementally update the information cached from the previous run of the clone detection. • Take a function, instead of a file, as a unit to track changes. • Track the change of clones, mark each clone class as new, unchanged, change+, changed-, or change+- .

  17. Clone Elimination • Fully automatic clone elimination not desirable in practice. • Choice of clones to remove. • functionality of the clone needs to be examined. • the anti-unification function of a clone class, and its parameters need to be renamed. • A host module for the anti-unification function needs to be selected.

  18. Clone Elimination with Wrangler • Copy and paste the anti_unification function to an proper Erlang module. • Modify the anti_unification function is necessary. • Rename function name. • Rename variable names. • Re-order function parameters. • Apply ‘fold expressions against a function definition’ to the new function.

  19. Case Study 1

  20. Incremental vs. Standalone Clone Detection

  21. Case Study 2

  22. SIP case study Session Initiation Protocol SIP message processing allows rewriting rules to transform messages. SIP message manipulation (SMM) is tested by smm_SUITE.erl, 2658 LOC.

  23. Clone detection

  24. Clone detection

  25. Reducing the case study

  26. Case Study 3

  27. Conclusions • Efficient clone detection on medium-sized projects. • Possible to improve code using these techniques, but only with expert involvement. • A mechanism for clone detection to contribute to the daily reports from incremental nightly builds; case-study for this with LambdaStream.

  28. Future Work • To extend the tool to detect expression sequences which are similar up to insertion, or deletion of some expressions. • To check client code against libraries.

  29. http://www.cs.kent.ac.uk/projects/wrangler/ Thank you!

More Related