1 / 14

Graphical Information on Plagiarism Activates

Graphical Information on Plagiarism Activates. Poon Yan Horn Jonathan. Table of Content. Background Motivation System structure Pair-wise detection Clustering Demo Q & A. Background .

hung
Download Presentation

Graphical Information on Plagiarism Activates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graphical Information on Plagiarism Activates Poon Yan Horn Jonathan

  2. Table of Content • Background • Motivation • System structure • Pair-wise detection • Clustering • Demo • Q & A

  3. Background • In spite of years of effort, plagiarism in student assignment submissions still causes considerable difficulties for course designers. • CAI (June 2005) – 40% of students admitted to engaging in plagiarism. • NUS FASS (AY 2008 – 2009) – 70 students were found guilty in committing plagiarism.

  4. Motivation • There are many detection systems can detect the similarities between submissions for an assignment. • The results, however, do not provide sufficient information on how program code is being exchanged among a group of students. • Most importantly, how does plagiarism works within a group of students throughout all assignments.

  5. System Structure Pair-wise plagiarism detection engine Clustering engine (DBSCAN) HTML / Graph generator Database

  6. Pair-wise Detection • Tokenize each submission. • Construct N-Gram representation for each submission • Determine the sub-sequence pairs of N-Grams between each submission. • Compute asymmetric similarities among each submission.

  7. Pair-wise Detection • Tokenize each submission • Removing whitespaces • Converting: • Keywords => ‘K’ • Identifiers => ‘V’ • Strings => ‘S’ • Constants => ‘C’ int main() { int a = 1; String b = “sb”;} KV(){KV=C;KV=S;}

  8. Pair-wise Detection • N-Gram construction • Compose sequence of 4-gram tokens KV(){KV=C;KV=S;} KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=S V=S; =S;}

  9. Pair-wise Detection • Determine the sub-sequence pairs between 2 sequences of N-Gram, A and B: • Check if each N-Gram in A can be found in B. • If a matched sub-sequence is longer than a minimum matching requirement, report this as a match. • A minimum matching requirement is 2 statements. KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=S V=S; =S;K S;KV ;KV= KV=C V=C; =C;} KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;} KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;}

  10. Pair-wise Detection • Compute the asymmetric similarity for File f1 to File f2

  11. Clustering • DBSCAN • Advantages • Fast Algorithm (O(n log n)) • Number of Clusters is automatically determined • Node (submitter) is classified as noise and omitted if in low density regions (not quite similar to other submitters) • Two properties • Eps – User defined grouping criteria base • MinPts – System predefined as 2

  12. Demo

  13. Q & A

  14. Thank you

More Related