Duplicate code detection using clone digger
Download
1 / 16

Duplicate code detection using Clone Digger - PowerPoint PPT Presentation


  • 438 Views
  • Updated On :

Duplicate code detection using Clone Digger. Peter Bulychev Lomonosov Moscow State University CS department. Outline. Theoretic part Clone detection problem in general The theory behind the tool Practical part

Related searches for Duplicate code detection using Clone Digger

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Duplicate code detection using Clone Digger' - amelia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Duplicate code detection using clone digger l.jpg

Duplicate code detection using Clone Digger

Peter Bulychev

Lomonosov Moscow State University

CS department


Outline l.jpg
Outline

  • Theoretic part

    • Clone detection problem in general

    • The theory behind the tool

  • Practical part

    • Clone Digger and the results of its application to several Python open-source projects

  • Other ongoing projects


What is software clone l.jpg
What is software clone?

  • Two fragments of code form clone if they are similar enough (according to a given measure of similarity)


Why is it important to detect code clones l.jpg
Why is it important to detect code clones?

  • 5% - 20% of code in software systems are clones1

  • Why do programmers produce clones?2

    • Development strategy

    • Maintenance benefits

    • Overcoming underlying limitations

    • Cloning by accident

  • Why is the presence of code clones bad?

    • Errors in the original must be fixed in every clone

      1. I.D. Baxter, et.al. Clone Detection Using Abstract Syntax Trees, 1998.

      2. C.K. Roy and J.R. Cordy. A Survey on Software Clone Detection Research, 2007.


Our definition of clone l.jpg
Our definition of clone

  • Different clone definitions can be classified according to the level of granularity:

    • List of strings

    • Sequence of tokens

    • Abstract syntax trees (AST)

    • Semantic information

  • We work on the AST level

  • We consider two sequences of statements as a clone if one of them can be obtained from the other by replacing some subtrees


Example l.jpg
Example

block

block

=

=

print

=

=

print

x

a

y

f

x

+

y

f

y

y

x

i

a

b

x

j


The sketch of the algorithm l.jpg
The sketch of the algorithm

  • Partition similar statements into clusters

  • Find pairs of identical cluster sequences

  • Refine by examining identified code sequences for structural similarity

i=0

f(i)

i+=1

i=0

f(k)

k+=1

k=0

f(k)


Main problems l.jpg
Main problems

  • How to compute similarity between two trees?

    • Use editing distance

  • How to compute similarity between a new tree and an existing tree cluster?

    • Comparing with each tree in cluster is expensive

    • Compare new tree with an average value stored for a cluster


Anti unification l.jpg
Anti-unification

  • Anti-unifier of two trees is the most specific generalization that matches both of them

f

f

f

+

+

*

?

+

/

2

x

x

y

x

?

x

z

x

2

?


Anti unification features l.jpg
Anti-unification features

  • Anti-unifier of a set of trees keeps common features: the common upper part

  • Anti-unification can be used to compute editing distance between two trees:

    Ө1и Ө2 - substitutions, E0 Ө1=E1 и E0 Ө2=E2

    distance = |Ө1| + |Ө2|


Clone digger l.jpg
Clone Digger

  • Is the first clone detection tool focused on Python (except Pylint)

  • Is provided under the GPL license

  • Writes the information on found clones to HTML in two column format with highlighting of differences

  • http://clonedigger.sourceforge.net


Comparison with existing tools working with asts l.jpg
Comparison with existing tools working with ASTs

  • CloneDR by Semantic Designs, I. Baxter, 1998

    • Hash functions on subtrees, some kind of editing distance

  • Asta by Microsoft Research, S. Evans, et. al, 2007

    • Subtree patterns (similar to anti-unification), hash functions on subtrees


Quick start l.jpg
Quick Start

  • $ easy_install clonedigger

  • $ clonedigger --recursive source_tree

  • $ firefox output.html

  • Additional parameters such as thresholds can be also set (use --help to know more)


Running on real life open source projects l.jpg
Running on real-life open-source projects

These numbers mean nothing …

… except that every large project has clones and they should be detected


What to do with found clones l.jpg
What to do with found clones?

  • Remove clones by refactoring. Extract method and Pull Up method can be used

  • Detect library candidates

  • Search for bugs



ad