1 / 12

Clone Detection by Exploiting Assembler

Clone Detection by Exploiting Assembler. Ian Davis, Mike Godfrey University of Waterloo Ontario, Canada. The Original Assembler. .LC107: .string "merge “ … pushl $ .LC107 pushl command_buf+8 .LCFI378: call prefixcmp addl $16,%esp testl %eax,%eax jne .L485 subl $8,%esp

zada
Download Presentation

Clone Detection by Exploiting Assembler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clone Detection by Exploiting Assembler Ian Davis, Mike Godfrey University of Waterloo Ontario, Canada

  2. Clone Detection by Exploiting Assembler

  3. Clone Detection by Exploiting Assembler

  4. Clone Detection by Exploiting Assembler

  5. Clone Detection by Exploiting Assembler

  6. The Original Assembler .LC107: .string "merge “ … pushl $.LC107 pushl command_buf+8 .LCFI378: call prefixcmp addl $16,%esp testl %eax,%eax jne .L485 subl $8,%esp pushl $32 pushl command_buf+8 call strchr addl $16,%esp incl %eax movl %eax,-16(%ebp) subl $12,%esp pushl $24 call xmalloc addl $16,%esp movl %eax,-8(%ebp) subl $12,%esp pushl -16(%ebp) call lookup_branch … .L485 • Identify function boundaries • Relate assembler back to source • Remove comments, white space, etc. • Normalize instruction set if needed • Convert to relative addressing • Inline string constants • Reconstruct parameter names • Reconstruct local variable names Clone Detection by Exploiting Assembler

  7. The Annotated Assembler pushl $"merge " pushl command_buf+8 call prefixcmp addl $16,%esp testl %eax,%eax jne +124 subl $8,%esp pushl $32 pushl command_buf+8 call strchr addl $16,%esp incl %eax movl %eax,from(%ebp) subl $12,%esp pushl $24 call xmalloc addl $16,%esp movl %eax,n (%ebp) subl $12,%esp pushl from(%ebp) call lookup_branch • Identify function boundaries • Relate assembler to source • Remove comments, white space, etc. • Normalize instruction set if needed • Convert to relative addressing • Inline string constants • Reconstruct parameter names • Reconstruct local variable names Clone Detection by Exploiting Assembler

  8. The Matching Algorithm • Scan entire source once • Use hashing to find first pairing • Ignore pairings in identified clones • Don’t cross function boundaries • Terminate clone before later in function • Weight matches (+) and mismatches (-) • Special logic for matching branches • Advance greedily while weight ≥ 0 • Then employ hill climbing • Continue while improvement possible • Accept if clones satisfy minimum length • Alternative minimum for matching functions Clone Detection by Exploiting Assembler

  9. Source Clone 1 from = strchr(command_buf.buf, ' ') + 1; n = xmalloc(sizeof(*n)); s = lookup_branch(from); if (s) hashcpy(n->sha1, s->sha1); else if (*from == ':') { uintmax_t idnum = strtoumax(from + 1, NULL, 10); struct object_entry*oe = find_mark(idnum); if (oe->type != OBJ_COMMIT) die("Mark :%" PRIuMAX " not a commit", idnum); hashcpy(n->sha1, oe->sha1); } else if (!get_sha1(from, n->sha1)) { unsigned long size; char *buf = read_object_with_reference(n->sha1, commit_type, &size, n->sha1); if (!buf || size < 46) die("Not a valid commit: %s", from); free(buf); } else die("Invalid ref name or SHA1 expression: %s", from); Clone Detection by Exploiting Assembler

  10. Source Clone 2 from = strchr(command_buf.buf, ' ') + 1; s = lookup_branch(from); if (s) hashcpy(sha1, s->sha1); else if (*from == ':') { struct object_entry *oe; from_mark = strtoumax(from + 1, NULL, 10); oe = find_mark(from_mark); if (oe->type != OBJ_COMMIT) die("Mark :%" PRIuMAX " not a commit", from_mark); hashcpy(sha1, oe->sha1); } else if (!get_sha1(from, sha1)) { unsigned long size; char *buf; buf = read_object_with_reference(sha1, commit_type, &size, sha1); if (!buf || size < 46) die("Not a valid commit: %s", from); free(buf); } else die("Invalid ref name or SHA1 expression: %s", from); Clone Detection by Exploiting Assembler

  11. Benefits and Conclusions • Assembler easy to derive from source / object / executable • Compliments other clone detection approaches • Compiler performs useful normalization of source for free • The analysis is semantic – not syntactic • By function (forbidding overlapped clones pairs) • Can handle branching sensibly • Case statements easier to handle • Can weight different assembler instructions differently • Can reason about assembler when performing detection Clone Detection by Exploiting Assembler

  12. Thank You Clone Detection by Exploiting Assembler

More Related