1 / 28

Metamorphic Malware Research

Metamorphic Malware Research. Metamorphic Malware. Metamorphic software changes “shape” But has instance has same function In contrast, most software is “cloned” Metamorphism used by virus writers to evade signature detection Lots of interesting research problems We look at some here….

kordell
Download Presentation

Metamorphic Malware Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metamorphic Malware Research Metamorphic Malware 1

  2. Metamorphic Malware • Metamorphic software changes “shape” • But has instance has same function • In contrast, most software is “cloned” • Metamorphism used by virus writers to evade signature detection • Lots of interesting research problems • We look at some here… Metamorphic Malware 2

  3. Metamorphic Research • How metamorphic are hacker produced generators? • How to detect metamorphic viruses? • The “ultimate” metamorphic generator? • How to make metamorphic that “carries its own generator” • Related questions/issues? Metamorphic Malware 3

  4. Metamorphic Generators • To analyze metamorphic generators… • First problem is, how to compare code? • We developed a “similarity index” • Based on extracted opcodes • Can be represented graphically • Also gives a numerical score Metamorphic Malware 4

  5. Similarity • Suppose we want to compare exe files • Say, file X and file Y • Extract opcodes from each • x0, x1, …, xn and y0, y1, …, ym • Compare all 3-opcode subsequences • If they agree (in any order) plot a point on the axes at appropriate point • Filter noise with window of length 5 Metamorphic Malware 5

  6. Similarity • That is, matches of length 5 or greater are add to score • Lengths were determined experimentally • Scores range from 0 to 1, where 0 == no match, 1 == perfect match • Gives us a graphical view and a score • In graph, what is a perfect match? • Main diagonal, or segments parallel to it Metamorphic Malware 6

  7. Normal Files • Similar of typical “normal” files Metamorphic Malware 7

  8. Metamorphic Generators • A typical “metamorphic” generator Metamorphic Malware 8

  9. Metamorphic Generators • Highly metamorphic generator Metamorphic Malware 9

  10. Metamorphic Generators • We measured metamorphism of metamorphic generators • What did we find? • Generally, not very metamorphic… • We did find one exception: • Next Generation Virus Creation Kit (NGVCK) • Can we detect NGVCK viruses? Metamorphic Malware 10

  11. Metamorphic Detection • We “trained” a hidden Markov model • Based on a bunch of “family” viruses • Using extracted opcode sequences • Then trained a model for detection • Next, we discuss HMMs • Other techniques could be used • Neural nets, data mining, etc. Metamorphic Malware 11

  12. Hidden Markov Models • HMMs --- a machine learning technique • Widely used in speech recognition, bioinformatics, and other areas • We can train an HMM • Then use the resulting trained model to score unknown • High score? Data matches training data • Low score? Does not match training data Metamorphic Malware 12

  13. Hidden Markov Models • What are HMMs? • Consider an example… • Suppose we want to know average annual temperature in the past • We cannot go back in time • So what to do? • Suppose we know that tree ring size is related to temperature Metamorphic Malware 13

  14. Hidden Markov Models • We consider 2 possible temperatures • Hot (H) and cold (C) • We consider 3 tree ring sizes • Small (S), medium (M), large (L) • Based on measurements, we find: Metamorphic Malware 14

  15. HMM • Also, based on historical record: • Then transitions between hot and cold years is a Markov process (order 1) • For the past, we cannot observe temp • But, we can measure tree rings sizes Metamorphic Malware 15

  16. HMMs • HMM give us efficient algorithms to solve problems like: • Given a series of tree ring sizes, can we say anything about temperatures? Metamorphic Malware 16

  17. HMMs • The generic picture is like this… • Note, there is a Markov process • And a series of observations Metamorphic Malware 17

  18. HMMs • HMM model denoted as: λ=(A,B,π) • A is state transition matrix • B gives probabilities of observations, depending on state of Markov process • π contains initial state probabilities • For HMMs there are efficient algorithms to solve 3 problems • Next slide… Metamorphic Malware 18

  19. The 3 HMM Problems 1. Given a model and observations, we can score the sequence of observations • How well does observed data fit model? 2. Given model and observations, we can find optimal state sequence • Here, we uncover the hidden states 3. Given observation sequence, we can train a model to best fit the data • Only assumption is size of the A matrix Metamorphic Malware 19

  20. HMM Training: English Text Example • Assuming 2 hidden states • Here, we show the B matrix… Metamorphic Malware 20

  21. HMMs and Metamorphic Generators • So, what’s the game plan? • Extract opcodes from several metamorphic viruses from same family • Train HMM model to on these opcodes (problem 3 from previous slide) • Given unknown file, score extracted opcodes using the trained HMM model (problem 1) Metamorphic Malware 21

  22. HMM Detection of NGVCK • Trained model works for detection • Effective to the point of practical… Metamorphic Malware 22

  23. Why Does this Work? • NGVCK viruses are highly metamorphic • But they have some common statistical properties • This is automatically extracted by HMM • NGVCK differs from normal code • So HMM can distinguish between the • How to make a “better” metamorphic generator? Hold that thought… Metamorphic Malware 23

  24. What Next? • Can we extract opcodes (or approximation) efficiently? • Are “profile hidden Markov models” better? • Similarity index for detection? • Better ways to measure similarity? • Statistical tests versus similarity? • HMMs to detect the “undetectable”? • HMM compared to other proposed methods? • Metamorphism for software watermarking? Metamorphic Malware 24

  25. Ultimate Metamorphic? • How to evade signature detection and HMM detection? • Metamorphic code evades signature detection • But how to also evade HMM detection? • Make the code highly metamorphic and similar to normal code • Then trained HMM will confuse the two Metamorphic Malware 25

  26. Ultimate Metamorphic? • Insert dead code from normal programs Before After Metamorphic Malware 26

  27. What Now? • How to detect the “ultimate” metamorphic generator? • Remove the dead code • How to remove dead code? • Emulation can help, but… • Can we “improve” the generator? • Can we improve the detection? • Can we say something more general? Metamorphic Malware 27

  28. References • Revealing introduction to HMMs • Hunting for metamorphic engines • Profile hidden Markov models • Approximate disassembly • Detecting “undetectable” metamorphic viruses • Hunting for undetectable metamorphic viruses • And lots more work in progress… Metamorphic Malware 28

More Related