finding file clones in freebsd ports collection n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Finding File Clones in FreeBSD Ports Collection PowerPoint Presentation
Download Presentation
Finding File Clones in FreeBSD Ports Collection

Loading in 2 Seconds...

play fullscreen
1 / 10

Finding File Clones in FreeBSD Ports Collection - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

Finding File Clones in FreeBSD Ports Collection. Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue. File Clones. Two or more files with the same content Comments and code indentation ignored Inside a project or between different projects Research about file-clones is scarce

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Finding File Clones in FreeBSD Ports Collection' - stephen-stephens


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
finding file clones in freebsd ports collection

Finding File Clones in FreeBSD Ports Collection

Yusuke Sasaki

Tetsuo Yamamoto

Yasuhiro Hayase

Katsuro Inoue

file clones
File Clones
  • Two or more files with the same content
    • Comments and code indentation ignored
  • Inside a project or between different projects
  • Research about file-clones is scarce
    • Get new knowledge about file-clones

Project A

Project B

int main() {

printf(“Hello msr!”);

return 0;

}

fcfinder
FCFinder
  • Input
    • .c and .h files
  • Output
    • File-clone sets
  • Faster than other tools
  • Detection
    • Tokenization
    • MD5 Hash Calculation
    • Exact Matching
experiment
Experiment
  • Target
    • Only .c and .h files inthe FreeBSD Ports Collection
    • ~1.4M files
    • ~12 GB
    • 17.16 hours
  • We measured:
    • File size
    • Number of files in each project
    • Size of each file-clone set
    • Number of file-clones in a project

These values follow the power law

file clone set size
File-clone Set Size

Left:used in PHP5

Right:used in PHP4

used in both of PHP4 and 5

D

E

L:650 sets

R:500 sets

419 sets

120 file clones

100

5

10

50

L:61 file clones

R:59 file clones

file clone set size

R*2 =0.8508

file clones per project
File-clones per Project

Right:PHP4 modules

Center:projects related bin-utils

Left:PHP5 modules

G

5

10

50

100

500 1K 5K 10K

number of file clone sets

R*2 =0.8263

file clones between projects 1 3
File-clones Between Projects (1/3)

* Nodes show the projects

* Edges between projects show the number of file clones

between two projects

  • Ex) gcc41 and gfortran shares 7691 file clones
file clones between projects 2 3
File-clones Between Projects (2/3)

* Nodes show the projects

* Edges between projects show the number of file clones

between two projects

file clones between projects 3 3
File-clones Between Projects (3/3)

* Nodes show the projects

* Edges between projects show the number of file clones

between two projects

conclusions future work
Conclusions & Future Work

Conclusions

  • Measured several features of the FreeBSD Ports collection.
  • Found that the measured features follow the power law

Future Work

  • Projects logical coupling investigation