graphical information on plagiarism activates n.
Skip this Video
Download Presentation
Graphical Information on Plagiarism Activates

Loading in 2 Seconds...

play fullscreen
1 / 14

Graphical Information on Plagiarism Activates - PowerPoint PPT Presentation

  • Uploaded on

Graphical Information on Plagiarism Activates. Poon Yan Horn Jonathan. Table of Content. Background Motivation System structure Pair-wise detection Clustering Demo Q & A. Background .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Graphical Information on Plagiarism Activates' - hung

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
table of content
Table of Content
  • Background
  • Motivation
  • System structure
  • Pair-wise detection
  • Clustering
  • Demo
  • Q & A
  • In spite of years of effort, plagiarism in student assignment submissions still causes considerable difficulties for course designers.
    • CAI (June 2005) – 40% of students admitted to engaging in plagiarism.
    • NUS FASS (AY 2008 – 2009) – 70 students were found guilty in committing plagiarism.
  • There are many detection systems can detect the similarities between submissions for an assignment.
  • The results, however, do not provide sufficient information on how program code is being exchanged among a group of students.
  • Most importantly, how does plagiarism works within a group of students throughout all assignments.
system structure
System Structure

Pair-wise plagiarism detection engine

Clustering engine (DBSCAN)


Graph generator


pair wise detection
Pair-wise Detection
  • Tokenize each submission.
  • Construct N-Gram representation for each submission
  • Determine the sub-sequence pairs of N-Grams between each submission.
  • Compute asymmetric similarities among each submission.
pair wise detection1
Pair-wise Detection
  • Tokenize each submission
    • Removing whitespaces
    • Converting:
      • Keywords => ‘K’
      • Identifiers => ‘V’
      • Strings => ‘S’
      • Constants => ‘C’

int main() { int a = 1;

String b = “sb”;}


pair wise detection2
Pair-wise Detection
  • N-Gram construction
    • Compose sequence of 4-gram tokens


KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=S V=S; =S;}

pair wise detection3
Pair-wise Detection
  • Determine the sub-sequence pairs between 2 sequences of N-Gram, A and B:
    • Check if each N-Gram in A can be found in B.
    • If a matched sub-sequence is longer than a minimum matching requirement, report this as a match.
    • A minimum matching requirement is 2 statements.

KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=S V=S; =S;K S;KV ;KV= KV=C V=C; =C;}

KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV=

KV=C V=C; =C;}

KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;}

pair wise detection4
Pair-wise Detection
  • Compute the asymmetric similarity for File f1 to File f2
    • Advantages
      • Fast Algorithm (O(n log n))
      • Number of Clusters is automatically determined
      • Node (submitter) is classified as noise and omitted if in low density regions (not quite similar to other submitters)
    • Two properties
      • Eps – User defined grouping criteria base
      • MinPts – System predefined as 2