Who copied who
Sponsored Links
This presentation is the property of its rightful owner.
1 / 17

Who Copied Who? PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Who Copied Who?. Gordon Lingard School of Software University of Technology, Sydney [email protected] The Problem. Students copying computer code off other students within a subject is a significant problem. Different to problems of students copying from an external source.

Download Presentation

Who Copied Who?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Who Copied Who?

Gordon Lingard

School of Software

University of Technology, Sydney

[email protected]

The Problem

  • Students copying computer code off other students within a subject is a significant problem.

  • Different to problems of students copying from an external source.

  • Programs exists for determining code is a copy.

  • They don’t answer the question of who created the code and who copied it.

  • This presentation outlines a solution to this problem.

Presentation Outline

  • What is Computer Programming

  • Detection System

  • Assignment Submission System

  • Combining the Systems

  • Results and Conclusions

  • Questions

What Is Computer Programming?Computer Code

  • Computer programs are written in a formal programming language that looks like a cross between mathematics and natural language.

  • They have a very strict syntax structure.

  • The language is used to construct a large set of carefully orchestrated instructions that become the program.

  • Student programs are typically less than a thousand instructions. Commercial programs can be tens of thousands to millions of instructions.

  • Larger programs are of staggering complexity.

What Is Computer Programming?C++ Code Example

What Is Computer Programming? Why Learning to Program is Hard

  • Learning issues students face

    • Learning the language.

    • Learning how to use the language to create a program to do a specified task.

    • Managing the complexity as programs grow in size.

  • In the face of these issues, many students are overwhelmed and resort to copying.

Detection SystemProblems of detection

  • Disguise

    • Simple transformations that change the look of the code without changing what it does.

  • Combinatorics

    • n assignments creates p = n/(n-1)/2 pairs.

    • 100 assignments = 4950 pairs.

  • Code Overlap

    • Two pieces of code designed to do the same thing – about 50% of the code will be common.

    • Boilerplate code creating many false positives.

Program Instructions


Complexity Numbers

if (x > y) {

a[x] = b[1][y];

foo(&x, *y);



instr n-1

instr n

if(>) {

[] = [][];

(&, *);



tokenised n-1

tokenised n






complex n-1

complex n

Detection SystemComplexity Numbers

  • Tokenise Code.

  • Generate Complexity Numbers.

Detection SystemComparing Complexity Numbers

  • Determine the percentage of numbers common between two programs.

Submission System

  • Used for a number of years in parallel with the detection system.

  • A formative assessment tool.

    • Runs students programs with a suite of tests.

    • Analyses their code for poor programming practices.

  • The students can use the results from the tests to refine their assignments and re-submit as often as they like.

  • The submission system becomes a development environment.

Combining the SystemsOverview

  • Extract information from the detection system to create a digital fingerprint of an assignment.

  • The fingerprint helps to uniquely identify a piece of code while being unaffected to by minor changes to the code.

  • Append the fingerprint, along with time and date, to a log of submissions for each student.

  • Analyse logs to see if fingerprints are appearing between students and use the date/time to determine order of development.

Combining the SystemsDigital Fingerprints

  • A fingerprint is created by extracting the 6 largest, unique complexity numbers from all the numbers a piece of code generates.

  • Represent the 6 most complicated pieces of the code.

Assignment Code

Complexity Numbers

Digital Fingerprint = 6 largest unique complexity numbers in sorted order

if (x > y)

x = x * 6;


y = x + y;





*z = a->b[x];










68018 68682 72172 87219 97843 112103

Append fingerprint and date/time to log

Combining the SystemsSubmission Logs





Combining the SystemsComparing Logs

  • Comparing summary of logs.

  • Time frames in comparison makes it clear who originated the code, who copied and when.

Who Copied Who?Results

  • Rarely is there collaboration. It is students copying other students.

  • In cases of copying, the logs almost always make a very clear statement of what has happened and when.

  • The copying usually involves one copying off another, sometimes two but rarely more.

  • Frequently, it is not the final submission that gives away the copying, but earlier submissions. This can be seen in the logs and then examining the earlier submissions.

Who Copied Who?Conclusions

  • The system has proved extremely successful in presenting misconduct cases to the Faculty.

  • The sheer weight of evidence the logs produce often saves time as students don’t try and bluff their way through the allegation.

  • This allows the Faculty to shift the focus away from penalty and to remedial action.



  • Login