mimimorphism a new approach to binary code obfuscation n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Mimimorphism: A New Approach to Binary Code Obfuscation PowerPoint Presentation
Download Presentation
Mimimorphism: A New Approach to Binary Code Obfuscation

Loading in 2 Seconds...

play fullscreen
1 / 49

Mimimorphism: A New Approach to Binary Code Obfuscation - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Mimimorphism: A New Approach to Binary Code Obfuscation. Zhenyu Wu, Steven Gianvecchio, Mengjun Xie Advisor: Dr. Haining Wang. Malware Propagation & Detection. Internet & Ubiquitous Computing Billions of networked computers Playground for malware Suppression Techniques Static analysis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Mimimorphism: A New Approach to Binary Code Obfuscation' - ivan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mimimorphism a new approach to binary code obfuscation

Mimimorphism:A New Approach to Binary Code Obfuscation

Zhenyu Wu, Steven Gianvecchio, Mengjun Xie

Advisor: Dr. Haining Wang

malware propagation detection
Malware Propagation & Detection
  • Internet & Ubiquitous Computing
    • Billions of networked computers
    • Playground for malware
  • Suppression Techniques
    • Static analysis
      • Low latency, high throughput
      • Widely used, IDS deployable
    • Dynamic analysis
the game of hide and seek
The Game of Hide and Seek
  • Unique substring
    • Segments of the binary
  • Algorithmic detection
    • Build in transformations
  • Statistical analysis
    • Anomalies in code body
  • Advanced pattern matching
    • N-gram signatures
  • Semantic analysis
    • Persist high-level fingerprints
  • Un-obfuscated
    • Binary in plain
  • Oligomorphism
    • Simple transformation (XOR)
  • Polymorphism
    • Compression and encryption
  • Metamorphism
    • Meta transformation (P-code)
  • State of the Art
    • Control-flow encryption
    • Byte frequency manipulation
fugitive on the run
Fugitive On The Run

WANTED

$5,000,000

fugitive on the run2
Fugitive On The Run
  • Polymorphism
    • Compression & Encryption

Nobody looks like a small dark box!

?

?

fugitive on the run3
Fugitive On The Run
  • Metamorphism
    • Reordering Components

Cannot evade feature detections

Wanted

$5,000,000

!

!

fugitive on the run4
Fugitive On The Run
  • Control Flow Encryption
    • Prevent feature analysis

Increases suspicion

?

?

fugitive on the run5
Fugitive On The Run
  • The Real Player
    • Assume other people’s identity (Mimicry)
fugitive on the run6
Fugitive On The Run
  • Lessons Learned:
    • Evasion without obfuscating features
    • Evasion by refusing inspection
    • Evasion by mimicking
      • Obfuscating original features
      • Open to inspection, but disguises detection







binary executable mimicry
Binary Executable Mimicry
  • Mimimorphism:
    • Reversible transformation of an executable that produces output statically resembles other benign programs
    • Characteristics:
      • Completely erases features from the original binary
      • High order statistics matches benign executables
      • Transformed payload consists of “meaningful” control flows, highly resemble those from benign executables
mimic functions
Mimic Functions
  • Text Stenography Technique
    • Transforms the input data and produces mimicry output copies that assume statistical and grammatical (structural) properties of another type of data
    • Originally proposed by Peter Wayner as means to transport sensitive data under harsh surveillance
      • Novel use of Huffman coding
mimic functions1

mass  000111

(32 bits)

(6 bits)

Mimic Functions

Huffman Tree

  • Huffman Coding
    • Digesting
      • Builds a Huffman tree according to the symbol frequency
    • Encoding
      • Removes redundancies of the input data using a given Huffman tree
    • Decoding
      • Recovers the original data from the “condensed” data by emitting symbols according to the original Huffman tree

0

1

s

0

1

m

a

01 s

00  m

01  a

mimic functions2
Mimic Functions
  • What if we decodea piece of random data?
    • Produces “meaningless” data, but
      • The output exhibits similar symbol frequency to the digest- and -
      • Input data can be recovered by Huffman encode
  • Regular Mimic Function
    • Learn: Build a Huffman tree from sample text
    • Mimicry: Huffman decode on input (randomized)
    • Recover: Huffman encode
mimic functions3

0

1

c

0

1

l

n

Mimic Functions

chi

Huffman “Forest”

  • Insufficiencies
    • Produces illegible, garbled text
    • Frequency distributions follow 2n distribution
  • High-order Mimic Function
    • Captures interdependencies
      • Build multiple Huffman trees
      • One for each unique symbol prefix
    • Produces “sensible” text with much more “natural” symbol frequency distributions

rou

0

1

t

0

1

ins

n

g

0

1

p

t

mimicry text sample
Mimicry Text Sample
  • Mimicry of Peter Wayner’s paper
    • Produced by 6th order mimic function

Each of these historical reason, I don’t recommend using gA(t) to choose the safe. These one-to-one encoded with n leaves and punctuation. The starting every intended to find the same order mimic files. A Method is to break the trees by constructing the mimics the path down the most even though, offer no way that is, in this paper. Figure will not overflow memory. These produced by truncating letter. This need to handle n-th ordered compartment of nonsense words cannot bear any resemblance to B because this task is a Huffman showed in [1], [2], [3] among others.

mimimorphism
Mimimorphism
  • The Challenge: Machine Language Mimicking
    • Consists of instructions and control flows
      • Each instruction has a strict format to follow
      • Machines never make “typo”, or use wrong “tense”!
    • Mimic function has no knowledge of instructions
      • Often makes mistakes generating instructions
      • Have a low success rate of creating mimicry control flows
  • Our Solution
    • Integrate a custom assembler / disassembler
    • Help the mimic function understand the language
mimimorphism digesting
Mimimorphism: Digesting
  • Digesting

Mimicry Target

XOR

High Order Instruction

Mimic Function

Exec.

Binaries

Disassemble

PUSH

DEC

MOV

Instruction Huffman Forest

Control Flows

Mimicry Digest

mimimorphism digesting1
Mimimorphism: Digesting
  • Digesting

Instruction Prefix

XOR

MOV

MOV

PUSH

XOR

DEC

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

Exec.

Binary

PUSH

0

1

ModR/M

(Mod / Reg. / R/M)

DEC

INC

0

1

MOV

SIB

(Scale / Idx. / Base)

MOV

PUSH

MOV

Displacement

Instruction Huffman Tree

COMMON_INST Structure

mimimorphism digesting2
Mimimorphism: Digesting
  • Digesting

Instruction Encoding Template

Instruction Prefix

MOV

XOR

MOV

PUSH

XOR

DEC

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

PUSH

0

1

ModR/M

(Mod / Reg. / R/M)

DEC

INC

0

1

SIB

(Scale / Idx. / Base)

MOV

PUSH

MOV

Displacement

Instruction Huffman Tree

COMMON_INST Structure

mimimorphism digesting3
Mimimorphism: Digesting
  • Digesting

Instruction Encoding Template

MOV

MOV

Inst. Prefix

ModR/M

0

1

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

EAX

0

1

ECX

EDX

ModR/M

(Mod / Reg. / R/M)

ModR/M

(Mod / Reg. / R/M)

0

0

1

1

SIB

(Scale / Idx. / Base)

SIB

SIB

(Scale / Idx. / Base)

2x8+16

16bit

REP

3x4+0

……

Displacement

Displacement

Displacement

mimimorphism digesting4
Mimimorphism: Digesting
  • Digesting

Instruction Encoding Template

Instruction Prefix

MOV

XOR

MOV

PUSH

Inst. Prefix

ModR/M

0

1

DEC

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

EAX

0

1

0

1

ECX

EDX

ModR/M

(Mod / Reg. / R/M)

INC

0

0

1

1

0

1

SIB

SIB

(Scale / Idx. / Base)

2x8+16

16bit

REP

3x4+0

MOV

PUSH

……

Displacement

Displacement

Instruction Huffman Tree

mimimorphism digesting5
Mimimorphism: Digesting
  • Digesting

Instruction Prefix

Instruction Prefix

XOR

MOV

XOR

PUSH

PUSH

XOR

DEC

XOR

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

DEC

PUSH

PUSH

DEC

DEC

0

1

ModR/M

(Mod / Reg. / R/M)

MOV

INC

0

1

SIB

(Scale / Idx. / Base)

MOV

PUSH

MOV

Displacement

Instruction Huffman Tree

mimimorphism digesting6
Mimimorphism: Digesting
  • Digesting

Instruction Prefix

DEC

XOR

PUSH

MOV

DEC

PUSH

POP

MOV

DEC

PUSH

DEC

MOV

0

1

0

1

1

0

JMP

CALL

CMP

INC

1

0

1

0

MOV

MOV

XCHG

PUSH

Mimimorphic Digest

mimimorphism encoding
Mimimorphism: Encoding
  • Encoding

PRNG

High Order Instruction

Mimic Function

Mimicry Digest

Binary Data

Assemble

Mimicry

Binaries

mimimorphism encoding1
Mimimorphism: Encoding
  • Encoding

XOR

PUSH

DEC

XOR

01001001100101010001010010001001

Mimicry Digest

Binary Data

PUSH

DEC

0

1

INC

0

1

MOV

PUSH

Instruction Huffman Tree

Instruction Prefix

mimimorphism encoding2
Mimimorphism: Encoding
  • Encoding

Instruction Encoding Template

MOV

XOR

PUSH

Inst. Prefix

ModR/M

0

1

DEC

01001001100101010001010010001001

EAX

Binary Data

0

1

0

1

ECX

EDX

INC

0

0

1

1

0

1

SIB

2x8+16

16bit

REP

3x4+0

MOV

MOV

PUSH

……

Displacement

Instruction Huffman Tree

mimimorphism encoding3
Mimimorphism: Encoding
  • Encoding

Instruction Encoding Template

MOV

Inst. Prefix

ModR/M

0

1

01001001100101010001010010001001

EAX

0

1

16bit

ECX

EDX

ECX

0

0

1

1

SIB

2x8+16

16bit

3x4+0

REP

……

Displacement

3x4+0

mimimorphism encoding4
Mimimorphism: Encoding
  • Encoding

Instruction Encoding Template

MOV

MOV

Inst. Prefix

ModR/M

0

1

01001001100101010001010010001001

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

EAX

0

1

16bit

ECX

EDX

ModR/M

(Mod / Reg. / R/M)

ECX

0

0

1

1

SIB

SIB

(Scale / Idx. / Base)

2x8+16

16bit

REP

3x4+0

……

Displacement

3x4+0

Displacement

COMMON_INST Structure

mimimorphism encoding5
Mimimorphism: Encoding
  • Encoding

MOV

XOR

01001001100101010001010010001001

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

PUSH

ModR/M

(Mod / Reg. / R/M)

DEC

SIB

(Scale / Idx. / Base)

?

MOV

Displacement

COMMON_INST Structure

mimimorphism encoding6
Mimimorphism: Encoding
  • Encoding

Instruction Prefix

XOR

XOR

01001001100101010001010010001001

PUSH

PUSH

DEC

DEC

MOV

MOV

MOV

mimimorphism decoding
Mimimorphism: Decoding
  • Decoding

High Order Instruction

Mimic Function

Mimicry Digest

Mimicry

Binaries

Disassemble

Binary Data

PRNG

experimental setup
Experimental Setup
  • Training
    • Select 100 Windows XP system files as mimicry target
      • They represent typical legitimate binaries
    • Trained using 7th and 8th order mimimorphic engines
      • Most control flow basic blocks have 7-8 instructions
  • Evaluations
    • Statistical Anomaly Tests
      • Kolmogorov-Smirnov Test & Entropy Test
    • Semantic Detection Test
      • Control Flow Fingerprinting
evaluation results
Evaluation Results

0.09

  • Statistical Tests
    • Kolmogorov-Smirnov Test
      • Maximum byte frequency distribution differences
      • Legitimate: 0.074±0.045; Mimimorphic: 0.093±0.006
    • Entropy Test
      • Measurement of predictability (or randomness) of data
      • Legitimate: 6.353±0.258; Mimimorphic: 6.528±0.021

0.074

0.516

6.353

evaluation results1
Evaluation Results
  • Semantic Tests
    • Control Flow Fingerprinting
      • Statically analyze executables (with a special disassembler) and extract control flow patterns
      • Detecting malwares by matching their characteristic control flow patterns (i.e., shared fingerprints)
    • Between original binary and Mimimorphic instances
      • Shared fingerprints: the lower the better
      • Only 1 out of 100 instances share a single fingerprint (out of hundreds of thousands fingerprints)
evaluation results2
Evaluation Results
  • Semantic Tests
    • Between mimimorphic and legitimate binaries
      • Shared fingerprints: the higher the better
      • 7th order mimimorphic instances:
        • Average 1856.46±372.5 (72.93 benign files)
        • Minimum 1057 (44 files); Maximum 3321 (92 files)
      • 8th order mimimorphic instances:
        • Average 11407.99±912.42 (81.37 benign files)
        • Minimum 9606 (70 files); Maximum 14216 (91 files)
evaluation results3
Evaluation Results
  • Semantic Tests
    • A sample mimicry control flow pattern
      • Reproduced by a 7th order mimimorphic instance
limitations discussions
Limitations & Discussions
  • Application Constraint
    • Memory consumption: 600MB for 7th order and 1.2GB for 8th order mimimorphic transformation
      • Disk-based on-demand digest storage
    • Size increase: 20x inflation for 7th order and 30x for 8th order mimimorphic transformation
      • Typical malware are less than 100KB
      • Mimimorphism results in 2~3MB files
conclusion
Conclusion
  • We propose mimimorphism as a novel binary obfuscation technique
    • Enhanced high order mimic functions with custom assembler / disassembler
    • Achieves evasion by disguising, not refusing detection
    • Effective against both statistical anomaly detection as well as semantic fingerprinting tests
limitations discussions1
Limitations & Discussions
  • Robustness against other approaches
    • Automatic n-gram detections
      • Typical x86 instruction length: 2.1~2.8
      • 8th order mimimorphism can approach 16-gram mimicry
      • Existing n-gram detection algorithms can hardly scale up to
    • Static semantic analysis
      • Mimimorphism does not target specific detection techniques
      • Focuses on reproducing features from benign programs
      • Immune to lower order signature detections
limitations discussions2
Limitations & Discussions
  • Robustness against other approaches
    • Deep syntactic analysis
      • Fails to exactly reproduce high level syntactic features:
        • 45% “functions” do not have matching prologue and epilogue
        • Many jump instructions go across function boundaries
      • Detectable program-level anomalies
        • Not all programs follow conventions
        • Could lead to false positives
limitations discussions3
Limitations & Discussions
  • The Problem of the Unpacker
    • Mimimorphic transformation does not provide solution for hiding the unpacker
    • However, we believe unpackers do benefit from using mimimorphism
      • Unpacker is the weakness of polymorphism because it is easy to be “spotted” – all other payload is not executable!
      • All mimimorphic payload is “executable”, separating unpacker code from the payload becomes non-trivial
mimimorphism decoding1
Mimimorphism: Decoding
  • Decoding

High Order Instruction

Mimic Function

Mimicry Digest

Mimicry

Binaries

Disassemble

Binary Data

PRNG

mimimorphism decoding2
Mimimorphism: Decoding
  • Decoding

Instruction Prefix

XOR

MOV

MOV

PUSH

XOR

DEC

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

0

Mimicry

Binary

0

PUSH

0

1

ModR/M

(Mod / Reg. / R/M)

DEC

INC

0

1

Decoded Bits

MOV

SIB

(Scale / Idx. / Base)

MOV

MOV

PUSH

MOV

Displacement

Instruction Huffman Tree

COMMON_INST Structure

mimimorphism decoding3
Mimimorphism: Decoding

Decoded Bits

  • Decoding

Instruction Prefix

MOV

XOR

MOV

PUSH

Inst. Prefix

ModR/M

0

1

DEC

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

EAX

0

1

0

0

0

1

ECX

EDX

ModR/M

(Mod / Reg. / R/M)

INC

0

0

1

1

0

1

Decoded Bits

SIB

SIB

(Scale / Idx. / Base)

2x8+16

16bit

REP

3x4+0

MOV

PUSH

MOV

……

Displacement

Displacement

Instruction Huffman Tree

COMMON_INST Structure

mimimorphism decoding4
Mimimorphism: Decoding

Decoded Bits

1

0

1

0

  • Decoding

MOV

MOV

Inst. Prefix

ModR/M

0

1

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

EAX

0

1

16bit

ECX

EDX

ModR/M

(Mod / Reg. / R/M)

ModR/M

(Mod / Reg. / R/M)

ECX

0

0

1

1

SIB

(Scale / Idx. / Base)

SIB

SIB

(Scale / Idx. / Base)

2x8+16

16bit

REP

3x4+0

……

Displacement

Displacement

Displacement

3x4+0

mimimorphism decoding5
Mimimorphism: Decoding

Decoded Bits

1

0

1

0

  • Decoding

MOV

XOR

MOV

PUSH

Inst. Prefix

ModR/M

0

1

DEC

Inst. Prefixes

(Atomic op., repeat, operand size, etc.)

EAX

0

1

0

0

16bit

0

1

ECX

EDX

ModR/M

(Mod / Reg. / R/M)

ECX

INC

0

0

1

1

0

1

Decoded Bits

SIB

SIB

(Scale / Idx. / Base)

2x8+16

16bit

3x4+0

REP

MOV

PUSH

MOV

……

Displacement

Displacement

3x4+0

Instruction Huffman Tree

mimimorphism decoding6
Mimimorphism: Decoding
  • Decoding

Instruction Prefix

Instruction Prefix

XOR

PUSH

XOR

XOR

DEC

XOR

PUSH

01001001

10010101

DEC

PUSH

PUSH

0

0

1

0

1

0

DEC

DEC

0

1

MOV

INC

0

1

Decoded Bits

MOV

PUSH

MOV

Instruction Huffman Tree