1 / 19

Value-Based Program Characterization and Its Application to Software Plagiarism Detection

Value-Based Program Characterization and Its Application to Software Plagiarism Detection. ICSE 2011 Yoon-Chan Jhi, Xinran Wang, Sencun Zhu, Peng Liu, Dinghao Wu Penn State University Xiaoqi Jia State Key Laboratory of Information Security, Institute of Software,

torie
Download Presentation

Value-Based Program Characterization and Its Application to Software Plagiarism Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Value-Based Program Characterization and Its Application to Software Plagiarism Detection ICSE2011 Yoon-Chan Jhi, Xinran Wang, Sencun Zhu, Peng Liu, Dinghao Wu Penn State University Xiaoqi Jia State Key Laboratory of Information Security, Institute of Software, Chinese Academy of Sciences Embedded Lab. Park Yeongseong

  2. Contents • Introduction • State of the art • Core values • Design • Experiment • Discussion • Conclusion • Q&A

  3. Introduction • Identifying same or similar code is very important • Previous works • Static source code comparison – C1 • Static excutable code comparison – C2 • Dynamic control flow based methods – C3 • Dynamic API based methods– C4

  4. Introduction • Three highly desired requirements • R1 – Resiliency • R2 - Ability to directly work on binary executables • R3 – Platform independence • BUT!!!! Not satisfy requirement • Static source code comparison – C1 R1 R2 • Static excutable code comparison – C2 R1 • Dynamic control flow based methods – C3 R1 R3 • Dynamic API based methods – C4 R3

  5. Introduction • Introduce new approach • Core-values • 5 optimization options (-O0 ~ -O3, -Os) • 3 Compilers ( GCC, TCC, WCC ) • KlassMaster, Thicket, Loco/Diablo Obfuscators

  6. State of the arts • Code Obfuscation Techniques • data obfuscation, control obfuscation, layout obfuscation and preventive transformations • indirect branches, control-flow flattening, function-pointer aliasing • Static Analysis Based Plagiarism Detection • String-based • AST-based • Token-based • PDG-based • Birthmark-based

  7. State of the arts • Dynamic Analysis Based Plagiarism Detection • Whole program path based (WPP) • Sequence of API function calls birthmark(EXESEQ) • Frequency of API function calls birthmark(EXEFREQ) • System call based birthmark

  8. Core values • Runtime values • The output operands of the machine instructions executed • Core values • Constructed from runtime values • Eliminate non-core values • If is not derived form , is not a core-value of • If is not in the set of runtime values of is not a core-value of

  9. Core values

  10. Design-Value Sequence Extraction • Not all values associated with the execution of a program arecore-values • Value-updating instruction • Related to the program’s semantics

  11. Design-Value Sequence Refinementand Similarity Metric • To refine value sequences • Sequential refinement – reduction rate 16%~34% • Optimization-based refinement – 5 optimization • Address removal – exclude pointer values

  12. Design-Overview

  13. Experiment • Intel Quad-Core 2.00 GHz CPU • 4GB RAM • Linux machin • QEMU 0.9.1 • Questions • resilient • false accusation • credible

  14. Experiment-Obfuscation tool(resiliency) • Obfuscation techniques • SandMark, KlassMaster : Java bytecode obfuscators • Test application : Jlex • Lexical analyzer

  15. Experiment-Similar Programs(false accusation) • Test Application • 5 individual XML pasers:expat, libxml2, Parsifal, rxp,xercesc

  16. Experiment-Different Programs(credible) • Test application • Bzip2, gzip, oggenc, 9 of 11 programs • Result • Similarity scores between 0 and 0.27 • zip and gzip similarity scores are 1.0 • Same compression algorithm : deflate • zip and bzip2 similarity scores are 0.01 to 0.03 • Different compression algorithm : block sorting

  17. Conclusion • introduce a novel approach to dynamic characterization of executable programs. • The value-based method successfully discriminates 34 plagiarisms by SandMark, KlassMaster, Thicket.

  18. Q&A

More Related