Who copied who
1 / 17

Who Copied Who? - PowerPoint PPT Presentation

  • Updated On :

Who Copied Who?. Gordon Lingard School of Software University of Technology, Sydney glingard@it.uts.edu.au. The Problem. Students copying computer code off other students within a subject is a significant problem. Different to problems of students copying from an external source.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Who Copied Who?' - jerrick

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Who copied who

Who Copied Who?

Gordon Lingard

School of Software

University of Technology, Sydney


The problem
The Problem

  • Students copying computer code off other students within a subject is a significant problem.

  • Different to problems of students copying from an external source.

  • Programs exists for determining code is a copy.

  • They don’t answer the question of who created the code and who copied it.

  • This presentation outlines a solution to this problem.

Presentation outline
Presentation Outline

  • What is Computer Programming

  • Detection System

  • Assignment Submission System

  • Combining the Systems

  • Results and Conclusions

  • Questions

What is computer programming computer code
What Is Computer Programming?Computer Code

  • Computer programs are written in a formal programming language that looks like a cross between mathematics and natural language.

  • They have a very strict syntax structure.

  • The language is used to construct a large set of carefully orchestrated instructions that become the program.

  • Student programs are typically less than a thousand instructions. Commercial programs can be tens of thousands to millions of instructions.

  • Larger programs are of staggering complexity.

What is computer programming why learning to program is hard
What Is Computer Programming? Why Learning to Program is Hard

  • Learning issues students face

    • Learning the language.

    • Learning how to use the language to create a program to do a specified task.

    • Managing the complexity as programs grow in size.

  • In the face of these issues, many students are overwhelmed and resort to copying.

Detection system problems of detection
Detection SystemProblems of detection

  • Disguise

    • Simple transformations that change the look of the code without changing what it does.

  • Combinatorics

    • n assignments creates p = n/(n-1)/2 pairs.

    • 100 assignments = 4950 pairs.

  • Code Overlap

    • Two pieces of code designed to do the same thing – about 50% of the code will be common.

    • Boilerplate code creating many false positives.

Detection system complexity numbers

Program Instructions


Complexity Numbers

if (x > y) {

a[x] = b[1][y];

foo(&x, *y);



instr n-1

instr n

if(>) {

[] = [][];

(&, *);



tokenised n-1

tokenised n






complex n-1

complex n

Detection SystemComplexity Numbers

  • Tokenise Code.

  • Generate Complexity Numbers.

Detection system comparing complexity numbers
Detection SystemComparing Complexity Numbers

  • Determine the percentage of numbers common between two programs.

Submission system
Submission System

  • Used for a number of years in parallel with the detection system.

  • A formative assessment tool.

    • Runs students programs with a suite of tests.

    • Analyses their code for poor programming practices.

  • The students can use the results from the tests to refine their assignments and re-submit as often as they like.

  • The submission system becomes a development environment.

Combining the systems overview
Combining the SystemsOverview

  • Extract information from the detection system to create a digital fingerprint of an assignment.

  • The fingerprint helps to uniquely identify a piece of code while being unaffected to by minor changes to the code.

  • Append the fingerprint, along with time and date, to a log of submissions for each student.

  • Analyse logs to see if fingerprints are appearing between students and use the date/time to determine order of development.

Combining the systems digital fingerprints
Combining the SystemsDigital Fingerprints

  • A fingerprint is created by extracting the 6 largest, unique complexity numbers from all the numbers a piece of code generates.

  • Represent the 6 most complicated pieces of the code.

Assignment Code

Complexity Numbers

Digital Fingerprint = 6 largest unique complexity numbers in sorted order

if (x > y)

x = x * 6;


y = x + y;





*z = a->b[x];










68018 68682 72172 87219 97843 112103

Append fingerprint and date/time to log

Combining the systems submission logs
Combining the SystemsSubmission Logs





Combining the systems comparing logs
Combining the SystemsComparing Logs

  • Comparing summary of logs.

  • Time frames in comparison makes it clear who originated the code, who copied and when.

Who copied who results
Who Copied Who?Results

  • Rarely is there collaboration. It is students copying other students.

  • In cases of copying, the logs almost always make a very clear statement of what has happened and when.

  • The copying usually involves one copying off another, sometimes two but rarely more.

  • Frequently, it is not the final submission that gives away the copying, but earlier submissions. This can be seen in the logs and then examining the earlier submissions.

Who copied who conclusions
Who Copied Who?Conclusions

  • The system has proved extremely successful in presenting misconduct cases to the Faculty.

  • The sheer weight of evidence the logs produce often saves time as students don’t try and bluff their way through the allegation.

  • This allows the Faculty to shift the focus away from penalty and to remedial action.