1 / 14

Code duplication detection using ASTs

Code duplication detection using ASTs. Sponsored by: Terence Parr. Do Te Kien Graduate Student of University of San Francisco dtkien@usfca.edu. Introduction. Code duplication detection within a single program (not a cheater detector). Couple reasons:

keita
Download Presentation

Code duplication detection using ASTs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code duplication detection using ASTs Sponsored by: Terence Parr Do Te Kien Graduate Student of University of San Francisco dtkien@usfca.edu

  2. Introduction • Code duplication detection within a single program (not a cheater detector). • Couple reasons: • Errors and bugs can be duplicated and located at more than one place. Therefore, it is difficult to detect and fix. • To maintain duplicated chunks of code is not interesting.

  3. Examples of code duplication It doesn't matter with format characters!

  4. Examples of code duplication It doesn't matter what variable names are!

  5. Overview • Program reads a dir tree of Java files and prints out chains of duplicate code chunks with the line numbers and files containing those chunks. • Measure of equality: • Exact match but using normalized string based upon AST printout • "Fuzzy" match by replacing all variables with ID

  6. Examples • Normalized string in exact match • Normalized string in fuzzy match

  7. Demo • Exact match demo • Fuzzy match demo

  8. Algorithm • Step 1: Use ANTLR to normalize and categorize • Step 2: Run compare on the list of statements and map the results to the diagonal matrix • Step 3: Walk on matrix to detect and collect the chains of duplication code. • Step 4: Report results to file

  9. Diagonal Matrix There are three duplication code blocks: f1(st1->st3); f2(st1->st3); f3(st1->st3)

  10. Discuss • How ANTLR approach improves the code duplication detection? • Speed • Not only syntactic but also semantic

  11. Discuss

  12. My dream

  13. Questions

  14. Nice Words • I would like to express my gratitude to Professor Terence who inspires me to work on this project! • Thank you for your listening

More Related