Hidden markov models for software piracy detection
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Hidden Markov Models for Software Piracy Detection PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on
  • Presentation posted in: General

Hidden Markov Models for Software Piracy Detection. Shabana Kazi Mark Stamp. Intro. Here, we apply metamorphic analysis to software piracy detection Very similar to techniques used in malware detection But, problem is completely different Has nothing to do with malware

Download Presentation

Hidden Markov Models for Software Piracy Detection

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hidden markov models for software piracy detection

Hidden Markov Models for Software Piracy Detection

  • ShabanaKazi

  • Mark Stamp

HMMs for Piracy Detection


Intro

Intro

HMMs for Piracy Detection

  • Here, we apply metamorphic analysis to software piracy detection

  • Very similar to techniques used in malware detection

    • But, problem is completely different

    • Has nothing to do with malware

  • We show that there are other applications ofsuch techniques


Software piracy

Software Piracy

HMMs for Piracy Detection

  • Software piracy is major problem

    • By 2009 estimate, $3 to $4 lost to piracy for every $1 in software sales

  • Usually, piracy consists of taking software without modification

  • In some cases, software is modified

    • Commercial theft of intellectual property

    • Thief reallydoesn’t want to get caught…


Software piracy1

Software Piracy

HMMs for Piracy Detection

  • We assume software is stolen

    • Andmodified,making it hard to detect

    • If completely rewritten from scratch, we won’t detect it by our approach

  • Want to make life hard for bad guys

    • Ideally,majormodifications required

  • How much modification is need before we cannot reliably detect?


Goals

Goals

HMMs for Piracy Detection

  • Technique applicable to any software

  • No special effort by developer

    • Nothing extra inserted into code

  • We only require access to exe file

  • Not a watermarking scheme

    • More like software “birthmark” analysis

  • Also not plagiarism detection

    • Here,want a“deeper” analysis


Use case

Use Case

HMMs for Piracy Detection

  • You work for Alice’s Software Company

    • And you develop fancy software for ASC

  • Trudy’s Software Company (TSC) develops suspiciously similar product

  • You suspect TSC of stealing your code

    • Not identical, but seems similar

  • What can you do?

    • We’ve got some ideas that might help…


Use case1

Use Case

HMMs for Piracy Detection

  • Using the technique discussed here

  • Can easily measure code similarity

  • Low similarity?

    • Thennohopeof proving code is stolen

  • High similarity?

    • Further (costly) analysis is warranted

  • High similarity does not prove stolen

    • But a good reason to take a closer look


Background

Background

HMMs for Piracy Detection

  • Metamorphic software

    • Metamorphic techniques (dead code, permutation, substitution)

  • HMM

    • Basic ideas and notation

    • The 3 problems and their solutions (discussed at a high level)

  • We’ve seen all of this before


Overview

Overview

HMMs for Piracy Detection

  • Training and scoring

  • Train HMM on slightly morphed copies of given “base” software

    • Slight morphing to avoid overfitting

  • Score morphed copies and other files

    • Here, morphing serves to simulate modifications by attacker

  • Want to know how much morphing required before detection fails


Metamorphic generator

Metamorphic Generator

HMMs for Piracy Detection

  • Built our own metamorphic generator

  • Morph based on extracted opcodes

    • Morphing consists of dead code insertion

    • Specify a dead code percentage and number of blocks to insert

  • Do not require morphed code works

    • Makesdetectionmore difficult, not easier

    • A worst-case scenario, detection-wise


Training

Training

HMMs for Piracy Detection

  • Given a base executable file…

  • Extract its opcode sequence

  • Generate 100 slightly morphed copies

    • Each morphed 10%, using dead code extracted from random “normal” file

  • Train HMM on morphed copies

    • Using 5-fold cross validation

    • Note: We train one model for each “fold”


Training1

Training

HMMs for Piracy Detection

  • Illustration of training process

    • Slightly morphed copies of base program


Determine threshold

Determine Threshold

HMMs for Piracy Detection

  • For each of 5-folds

    • Train HMM

    • Score 20 morphed files (match set) and 15 normal (nomatch set)

  • Determine threshold based on scores

    • Threshold is highest scoreofnormal file

    • Implies FPR = 0; equivalently, TNR = 1 (for the given “fold”)


Setting a threshold

Setting a Threshold

HMMs for Piracy Detection

Process used to set threshold


Experiments

Experiments

HMMs for Piracy Detection

  • Want to determine robustness

  • For each base file tested…

  • Train to obtain HMM and threshold

  • Morph base file at various percentages

    • Using various morphing strategies

    • Refer to this morphing as tampering

  • Score eachtamperedcopy

    • Classify, based on threshold


Experiments1

Experiments

HMMs for Piracy Detection

Scoring tampered files


Experiment details

Experiment Details

HMMs for Piracy Detection

  • For each base file

    • 6 models

    • 10 tamper percent for each

    • 100 files each

    • So, 6000 scores!


Experiment details1

Experiment Details

HMMs for Piracy Detection

  • Tested 10 base files,each data point

    • So 60,000 scores computed…


Experiment details2

Experiment Details

HMMs for Piracy Detection

  • Repeated entire experiment 6 times

    • Using different number of blocks in training phase

    • Training made little difference on scores

    • So, here we only give results where 1 block used in training phase

  • In total 360,000 scores computed

    • And360 “models” generate

    • That is, 1800HMMs(one per fold)


Results bar graph

Results: Bar Graph

HMMs for Piracy Detection


Results 3 d plot

Results: 3-d Plot

HMMs for Piracy Detection


Conclusions

Conclusions

HMMs for Piracy Detection

  • Results look very promising

    • Robust  high degree of morphing required before base file undetected

    • Practical  only requires exe, no special effort when developing

    • Applies to any exe, at any time

  • Overall, strong software “birthmark” strategy with practical implications


Future work

Future Work

HMMs for Piracy Detection

  • Statistical analysis somewhat weak

    • Resultsmay bestronger thanit appears

  • Many other scores/combinations of scores can be tested

    • Results can only get better

  • Consider other morphing techniques

    • And other file types (e.g., bytecode)

    • And mitigations for 1-block morphing …


References

References

HMMs for Piracy Detection

S. Kazi and M. Stamp, Hidden Markov models for software piracy detection, Information Security Journal: A Global Perspective, 22:140-149, 2013


  • Login