Graphical information on plagiarism activates
Download
1 / 14

Graphical Information on Plagiarism Activates - PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on

Graphical Information on Plagiarism Activates. Poon Yan Horn Jonathan. Table of Content. Background Motivation System structure Pair-wise detection Clustering Demo Q & A. Background .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Graphical Information on Plagiarism Activates' - hung


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Table of content
Table of Content

  • Background

  • Motivation

  • System structure

  • Pair-wise detection

  • Clustering

  • Demo

  • Q & A


Background
Background

  • In spite of years of effort, plagiarism in student assignment submissions still causes considerable difficulties for course designers.

    • CAI (June 2005) – 40% of students admitted to engaging in plagiarism.

    • NUS FASS (AY 2008 – 2009) – 70 students were found guilty in committing plagiarism.


Motivation
Motivation

  • There are many detection systems can detect the similarities between submissions for an assignment.

  • The results, however, do not provide sufficient information on how program code is being exchanged among a group of students.

  • Most importantly, how does plagiarism works within a group of students throughout all assignments.


System structure
System Structure

Pair-wise plagiarism detection engine

Clustering engine (DBSCAN)

HTML /

Graph generator

Database


Pair wise detection
Pair-wise Detection

  • Tokenize each submission.

  • Construct N-Gram representation for each submission

  • Determine the sub-sequence pairs of N-Grams between each submission.

  • Compute asymmetric similarities among each submission.


Pair wise detection1
Pair-wise Detection

  • Tokenize each submission

    • Removing whitespaces

    • Converting:

      • Keywords => ‘K’

      • Identifiers => ‘V’

      • Strings => ‘S’

      • Constants => ‘C’

int main() { int a = 1;

String b = “sb”;}

KV(){KV=C;KV=S;}


Pair wise detection2
Pair-wise Detection

  • N-Gram construction

    • Compose sequence of 4-gram tokens

KV(){KV=C;KV=S;}

KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=S V=S; =S;}


Pair wise detection3
Pair-wise Detection

  • Determine the sub-sequence pairs between 2 sequences of N-Gram, A and B:

    • Check if each N-Gram in A can be found in B.

    • If a matched sub-sequence is longer than a minimum matching requirement, report this as a match.

    • A minimum matching requirement is 2 statements.

KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=S V=S; =S;K S;KV ;KV= KV=C V=C; =C;}

KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV=

KV=C V=C; =C;}

KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;}


Pair wise detection4
Pair-wise Detection

  • Compute the asymmetric similarity for File f1 to File f2


Clustering
Clustering

  • DBSCAN

    • Advantages

      • Fast Algorithm (O(n log n))

      • Number of Clusters is automatically determined

      • Node (submitter) is classified as noise and omitted if in low density regions (not quite similar to other submitters)

    • Two properties

      • Eps – User defined grouping criteria base

      • MinPts – System predefined as 2





ad