extracting code clones for refactoring using combinations of clone metrics n.
Download
Skip this Video
Download Presentation
Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

Loading in 2 Seconds...

play fullscreen
1 / 29

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics. Eunjong Choi † , Norihiro Yoshida ‡ , Takashi Ishio † , Katsuro Inoue † , and Tateki Sano*. †Osaka University, Japan ‡ Nara Institute of Science and Technology , Japan *NEC Corporation, Japan. Background: Clone Set.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Extracting Code Clones for Refactoring Using Combinations of Clone Metrics' - brenden-duncan


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
extracting code clones for refactoring using combinations of clone metrics

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

Eunjong Choi†, Norihiro Yoshida‡, Takashi Ishio†,Katsuro Inoue†, and Tateki Sano*

†Osaka University, Japan‡Nara Institute of Science and Technology , Japan

*NEC Corporation, Japan

background clone set
Background: Clone Set
  • A set of code clones that is similar or identical to each other

Code Clone 1

similar

Code Clone 4

identical

Code Clone 2

Code Clone 5

Code Clone 3

Clone Set:

S1={Code Clone 1, Code Clone 3}

S2={Code Clone 2, Code Clone 4, Code Clone 5}

background refactoring code clone
Background: Refactoring Code Clone
  • Merge code clones into a single program unit

Code Clone 1

Code Clone’ 1

Refactoring

Code Clone 2

Code Clone 2

Code Clone 3

background language dependent code clone
Background: Language-dependent Code Clone
  • It is unavoidable to exist in source code
    • because of features of the used program language.

/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */

else {

// is the zip file in the cache

file);

if == null) {

(file);

;

}

/* Code Clone B */

def.setName(name);

def.setClassName(classname);

def.setClass(cl);

def.setAdapterClass(adapterClass);

def.setAdaptToClass(adaptToClass);

def.setClassLoader(al);

/* … */

/* Code Clone A */

replacement.setTaskType(taskType);

replacement.setTaskName(taskName);

replacement.setLocation(location);

replacement.setOwningTarget(target);

replacement.setRuntime (wrapper);

wrapper.setProxy(replacement);

/* … */

Example of the language-dependent code clone

(Consecutive setter invocations)

background clone metrics higo2007
Background: Clone Metrics [Higo2007]
  • Quantitative information on clone sets
    • E.g., LEN(S), RNR(S), POP(S)
  • Purposes
    • To check features of code clones in software
    • To extract code clones for several purposes
      • E.g., refactoring, defect-prone code clones

[Higo2007] Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, "Method and Implementation for Investigating Code Clones in a Software System", Information and Software Technology, pp. 985-998 (2007-9)

clone metrics len s
Clone Metrics: LEN(S)
  • The average length of token sequences of code clones in a clone set S

A token sequence [c c* ]

is detected as a code clone

from a token sequence

<c c* c* a b>

LEN(S) = 2

Superscript * indicated that the token is in a repeated token sequence

Clone set S

clone metrics rnr s
Clone Metrics: RNR(S)
  • The ratio of non-repeated token sequences of code clones in a clone set S

A token sequence [c c* ]

is detected as a code clone

from a token sequence

<c c* c* a b>

The length of non-repeated

token sequence

RNR(S) =• 100 = 50

2

The length of whole

token sequence

Clone set S

clone metrics pop s
Clone Metrics: POP(S)
  • The number of code clones in a clone set S

1

3

2

POP(S) = 6

4

5

6

Clone set S

single clone metric 1 2
Single Clone Metric (1/2)
  • Clone sets whose RNR(S) is higher
    • They do not organize a single semantic unit
      • semantic unit : many instructions forming a single functionality

/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */

else {

// is the zip file in the cache

ZipFilezipFile = (ZipFile) zipFiles.get(file);

if (zipFile == null) {

zipFile = new ZipFile(file);

zipFiles.put(file, zipFile);

}

ZipEntry entry = zipFile.getEntry(resourceName);

if (entry != null) {x

Not Appropriate for Refactoring!

a part of semantic unit

single clone metric 2 2
Single Clone Metric (2/2)
  • Clone sets whose POP(S) is higher
    • They Include many language-dependent code clones

/* Code Clone in a clone set whose POP(S) is the first highest in Ant 1.7.0 */

out.println("\">");

out.println("");

out.print("<!ELEMENT project (target | ");

out.print(TASKS);

out.print(" | ");

out.print(TYPES);

Not Appropriate for Refactoring!

key idea
Key Idea
  • It is not appropriate to extract refactorable code clones using just a single clone metric
    • According to our experiences
  • We propose a method based on combined clone metrics
    • To improve the weakness of single-metric-based extraction
combined clone metrics
Combined Clone Metrics
  • Clone sets whose RNR(S), POPS(S) are higher
    • Each code clone organizes a single semantic units

/* Code Clone in a clone set whose RNR(S), POP(S) are higher than others*/

if (ifProperty != null && p.getProperty(ifProperty) == null) {

return false;

} else if (unlessProperty != null

&& p.getProperty(unlessProperty) != null) {

return false;

}

return true;

}

Appropriate for Refactoring!

case study 1 2
Case Study (1/2)
  • Goal: validating our key idea
    • Using combined clone metrics is a feasible method to extract code clone for refactoring
  • Target System
    • Industrial Java software developed by NEC
    • 110KLOC, 736 clone sets
case study 2 2
Case Study (2/2)
  • Experimental Step
    • Selected 62 clone sets from CCFinder's output using clone metrics.
    • Conducted a survey about these clone sets and got feedback from a developer.

Survey

Feed back

CCFinder

Source files

Clone sets using clone metrics

subject code clones 1 2
Subject Code Clones (1/2)
  • Clone sets whose either clone metric value is high
    • Clone sets whose LEN(S) value is top 10 high
    • Clone sets whose RNR(S) value is top 10 high
    • Clone sets whose POP(S) value is top 10 high
subject code clones 2 2
Subject Code Clones (2/2)
  • Clone sets whose combined clone metrics values are high
    • 15 clone sets whose LEN(S) and RNR(S) values are high rank in the top 15
    • 7 clone sets whose LEN(S) and POP(S) values are high rank in the top 15
    • 18 clone sets whose RNR(S) and POP(S) values are high rank in the top 15
    • 1 clone set whose LEN(S), RNR(S) and POP(S) values are high rank in the top 15
results of case study 1 2
Results of Case Study (1/2)
  • #Selected Clone Sets: The number of selected clones
  • #Refactoring: The number of clone sets marked as “Perform refactoring“ in survey
results of case study 2 2
Results of Case Study (2/2)
  • Precision : “How many refactoring candidates were accepted by a developer?“

#Refactoring

Precision =

#Selected Clone Sets

Combined clone metrics is more accepted

as refactoring candidates by a developer

summary and future work
Summary and Future Work
  • Summary
    • Our Industrial case study shows that our key idea is appropriate.
  • Future Work
    • Investigate about recall
    • Conduct case studies of open source software
    • Suggest a new metric
clone sets whose rnr s is higher than others
Clone sets whose RNR(S) is higher than others
  • Each code clone in a clone set S consists of more non-repeated token sequences

/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */

else {

// is the zip file in the cache

ZipFilezipFile = (ZipFile) zipFiles.get(file);

if (zipFile == null) {

zipFile = new ZipFile(file);

zipFiles.put(file, zipFile);

}

ZipEntry entry = zipFile.getEntry(resourceName);

if (entry != null) {

/* … */

clone sets whose rnr s is lower than others
Clone sets whose RNR(S) is lower than others
  • Consists of more repeated token sequences
    • Involve in language-dependent code clone

/* Code Clone in a clone set whose RNR(S) is the lowest in Ant 1.7.0 */

String sosCmdDir = null;

…… skip code….

private String filename = null;

private boolean noCompress = false;

private boolean noCache = false;

private boolean recursive = false;

private boolean verbose = false;

/* … */

Consecutive

variable declarations

survey format about clone set xxx
Survey Format: About Clone set XXX

(1) Do you think that this clone set need a practice?

[] Yes [] No(→Jump to next clone set)

(2) If you marked “Yes” in your answer to (1), what practice is appropriate for this clone set?

[] Refactoring

[] Write comments about code clones, but don’t perform refactoring.

[] Change nothing.

[] Others. (

(3) Write the reason why did you mark in your answer to (2)

Reason :

clone metric rnr s 1 2
Clone metric: RNR(S) (1/2)
  • File:
    • F1: a b c a b,
    • F2: c c* c* a b,
    • F3: d a b, e f
    • F4: c c* d e f
      • Superscript * indicated that the token is in a repeated token sequence
    • RNR(S1) of Clone Set S1 is

Clone Set:

S1: { , , , }

2 + 2 + 2 + 2

2 + 2 + 2 + 2

RNR(S1) = • 100 = 100

ab

ab

ab

ab

clone metric rnr s 2 2
Clone metric: RNR(S) (2/2)
  • File:
    • F1: a b c a b,
    • F2: c c* c* a b,
    • F3: d a b, e f
    • F4: c c* d e f
      • Superscript * indicated that the token is in a repeated token sequence
    • RNR(S2) of Clone Set S2 is

Clone Set:

S2: { , , }

c c*

c* c*

c c*

1 + 0 + 1

2 + 2 + 2

RNR(S2) = • 100 = 33.3

subject code clones
Subject Code Clones
  • 62 clone sets
    • clone sets whose individual clone metric value is high
      • SLEN Clone sets whose LEN(S) value is top 10 high.
      • SRNR Clone sets whose RNR(S) value is top 10 high.
      • SPOP Clone sets whose POP(S) value is top 10 high.
    • clone sets whose combined clone metrics values are high
      • SLEN∙RNR15 clone sets whose LEN(S) and RNR(S) values are high rank in the top 15.
      • SLEN∙POP7 clone sets whose LEN(S) and POP(S) values are high rank in the top 15.
      • SRNR∙POP18 clone sets whose RNR(S) and POP(S) values are high rank in the top 15.
      • SLEN∙RNR∙POP 1 clone set whose LEN(S), RNR(S) and POP(S) values are high rank in the top 15.
slide28

The Number of Duplicate Clone Set

  • | SRNR ∩ SPOP ∩ SRNR∙POP| = 1
  • | SRNR ∩ SRNR∙ POP| = 2
  • | SPOP ∩ SRNR∙ POP| = 2
  • | SLEN∙ RNR∩ SLEN∙ POP∩ SRNR∙ POP∩ SLEN ∙ RNR∙ POP| = 1

CSセミナー 2010/12/01

example of clone set that are not selected
Example of clone set that are not selected…
  • It is too short to organize a semantic unit.
  • RNR metric sometimes extract unintentional code clones
    • E.g., Language-dependent code clones

boolean isEqual(final DeweyDecimal other) {

final int max = Math.max(other.components.length, components.length);

for (int i = 0; i < max; i++) {

final int component1 = (i < components.length) ? components[ i ] : 0;

final int component2 = (i < other.components.length) ? other.components[ i ] : 0;

if (

ad