self learning anti virus scanner
Download
Skip this Video
Download Presentation
Self-Learning Anti-Virus Scanner

Loading in 2 Seconds...

play fullscreen
1 / 33

Self-Learning Anti-Virus Scanner - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Self-Learning Anti-Virus Scanner. Arun Lakhotia , Professor Andrew Walenstein , Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL. Introduction. Alumni in AV Industry Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar McAfee AVERT

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Self-Learning Anti-Virus Scanner' - marcos


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
self learning anti virus scanner

Self-Learning Anti-Virus Scanner

ArunLakhotia, Professor

Andrew Walenstein, Assistant ProfessorUniversity of Louisiana at Lafayette

www.cacs.louisiana.edu/labs/SRL

2008 AVAR (New Delhi)

introduction
Introduction
  • Alumni in AV Industry
  • Prabhat Singh
  • Nitin Jyoti
  • Aditya Kapoor
  • Rachit Kumar McAfee AVERT
  • Erik Uday Kumar,Authentium
  • Moinuddin Mohammed,Microsoft
  • Prashant Pathak, Ex-Symantec
  • Funded by: Louisiana Governor’s IT Initiative
  • Director, Software Research Lab
  • Lab’s focus: Malware Analysis
  • Graduate level course on Malware Analysis
  • Six years of AV related research
  • Issues investigated:
  • Metamorphism
  • Obfuscation

AVAR 2008 (New Delhi)

outline
Outline
  • Attack of Variants
    • AV vulnerability: Exact match
  • Information Retrieval Techniques
    • Inexact match
  • Adapting IR to AV
    • Account for code permutation
  • Vilo: System using IR for AV
  • Integrating Vilo into AV Infrastructure
  • Self-Learning AV using Vilo

2008 AVAR (New Delhi)

attack of variants
ATTACK OF VARIANTS

2008 AVAR (New Delhi)

variants vs family
Variants vs Family

Source: Symantec Internet Threat Report, XI

AVAR 2008 (New Delhi)

analysis of attacker strategy
Analysis of attacker strategy
  • Purpose of attack of variants
    • Denial of Service on AV infrastructure
    • Increase odds of passing through
  • Weakness exploited
    • AV system use: Exact match over extract
  • Attack strategy
    • Generate just enough variation to beat exact match
  • Attacker cost
    • Cost of generating and distributing variants

2008 AVAR (New Delhi)

analyzing attacker cost
Analyzing attacker cost
  • Payload creation is expensive
    • Must reuse payload
  • Need thousands of variants
    • Must be automated
  • “General” transformers are expensive
    • Specialized, limited transformers
      • Hence packers/unpackers

2008 AVAR (New Delhi)

attacker vulnerability
Attacker vulnerability
  • Automated transformers
    • Limited capability
    • Machine generated, must have regular pattern
  • Exploiting attacker vulnerability
    • Detect patterns of similarities
    • Approach
      • Information Retrieval (this presentation)
      • Markov Analysis (other work)

2008 AVAR (New Delhi)

information retrieval
Information Retrieval

2008 AVAR (New Delhi)

ir basics
IR Basics
  • Basis of Google, Bioinformatics
  • Organizing very large corpus of data
  • Key idea
    • Inexact match over whole
    • Contrast with AV
      • Exact match over extract

2008 AVAR (New Delhi)

ir problem
IR Problem

Document Collection

IR

Related documents

Query: Keywords orDocument

AVAR 2008 (New Delhi)

ir steps
IR Steps

Step 1: Convert documents to vectors

1a. Define a method to identify “features”

Example: k-consecutive words

1b. Extract all features from all documents

Have you wondered

When is a rose a rose?

1c. Count features, make feature vector

Have you wondered

1

You wondered when

1

Wondered when rose

1

When rose rose

1

[1, 1, 1, 1, 0,0]

How about onions

0

Onion smell stinks

0

AVAR 2008 (New Delhi)

ir steps1
IR Steps
  • Step 2: Compute feature vectors
    • Take into account features in entire corpus
    • Classical method
      • W=TF x IDF

DF = # documents containing the feature

IDF = Inverse of DF

TF = Term Frequency

TF(v1)

DF

w1 = TFxIDF(v1)

IDF

You wondered when

5

1

1/5

1/5

Wondered when rose

7

2

1/7

2/7

When rose rose

5

8

5/8

1/8

How about onions

6

3

1/6

3/6

Onion smell stinks

3

0

1/3

0/3

AVAR 2008 (New Delhi)

ir steps2
IR Steps
  • Step 3: Compare vectors
    • Cosine similarity

w1 = [0.33, =0.25, 0.66, 0.50]

w1 = [0.33, =0.25, 0.66, 0.50]

2008 AVAR (New Delhi)

ir steps3
IR Steps

Document Collection

  • Step 4: Document Ranking
    • Using similarity measure

Matching document

0.30

0.82

0.90

0.76

IR

New Document

AVAR 2008 (New Delhi)

adapting ir for av
Adapting IR for AV

AVAR 2008 (New Delhi)

adapting ir for av1

l2D2: push ecx

push 4

pop ecx

push ecx

l2D7: rol edx, 8

mov dl, al

and dl, 3Fh

shr eax, 6

loop l2D7

pop ecx

call s319

xchg eax, edx

stosd

xchg eax, edx

inc [ebp+v4]

cmp [ebp+v4], 12h

jnz short l305

l2D2: push ecx

push 4

pop ecx

push ecx

l2D7: rol edx, 8

mov dl, al

and dl, 3Fh

shr eax, 6

loop l2D7

pop ecx

call s319

xchg eax, edx

stosd

xchg eax, edx

inc [ebp+v4]

cmp [ebp+v4], 12h

jnz short l305

push

push

pop

push

rol

mov

and

shr

loop

pop

call

xchg

stosd

xchg

inc

cmp

jnz

l144: push ecx

push 4

pop ecx

push ecx

l149: mov dl, al

and dl, 3Fh

rol edx, 8

shr ebx, 6

loop l149

pop ecx

call s52F

xchg ebx, edx

stosd

xchg ebx, edx

inc [ebp+v4]

cmp [ebp+v4], 12h

jnz short l18

l144: push ecx

push 4

pop ecx

push ecx

l149: mov dl, al

and dl, 3Fh

rol edx, 8

shr ebx, 6

loop l149

pop ecx

call s52F

xchg ebx, edx

stosd

xchg ebx, edx

inc [ebp+v4]

cmp [ebp+v4], 12h

jnz short l18

push

push

pop

push

mov

and

rol

shr

loop

pop

call

xchg

stosd

xchg

inc

cmp

jnz

Adapting IR for AV

Step 0: Mapping program to document

Extract Sequence of operations

2008 AVAR (New Delhi)

adapting ir for av2

P

P

O

P

R

M

A

S

L

O

C

X

S

X

I

C

J

P

P

O

P

M

A

R

S

L

O

C

X

S

X

I

C

J

Virus 1

Virus 2

Adapting IR for AV

Step 1a: Defining features k-perm

P P O P R M A S L O C X S X I C J

P P O P S L O C X S X I C J

M A

R

Feature = Permutation of k operations

2008 AVAR (New Delhi)

adapting ir for av3

Virus 1

Virus 2

P

P O P

M A R S L

O C X S X

I C J

P

P O P

M A R S L

O C X S X

I C J

Adapting IR for AV

Step 1 Example of 3-perm

P P O P R M A S L O C X S X I C J

P O P

Virus 3

P

P O P

M A R S L

O C X S X

I C J

AVAR 2008 (New Delhi)

adapting ir for av4

1

P O PR M A S L

2

P O PM A R S L

M A R S L P O P

3

MARS

PMAR

MARS

PMAR

0

0

1

0

0

1

1

0

0

Adapting IR for AV

Step 2: Construct feature vectors (4-perms)

AVAR 2008 (New Delhi)

adapting ir for av5
Adapting IR for AV
  • Step 3: Compare vectors
    • Cosine similarity (as before)
  • Step 4: Match new sample

AVAR 2008 (New Delhi)

vilo system using ir for av
Vilo: System using IR for AV

AVAR 2008 (New Delhi)

vilo functional view
Vilo Functional View

Malware Collection

Malware Match

0.90

0.82

0.76

0.30

Vilo

New Sample

AVAR 2008 (New Delhi)

vilo in action query match
Vilo in Action: Query Match

AVAR 2008 (New Delhi)

vilo performance
Vilo: Performance

Response time vs Database size

Search on generic desktop:

In Seconds

Contrast with

Behavior match: In Minutes

Graph match: In Minutes

AVAR 2008 (New Delhi)

vilo match accuracy
Vilo Match Accuracy

ROC Curve: True Positive vs False Positive

True Positive

False Positive

AVAR 2008 (New Delhi)

vilo in av product
Vilo in AV Product

AVAR 2008 (New Delhi)

vilo in av product1
Vilo in AV Product

AV Systems: Composed of classifiers

Classifier

Classifier

Classifier

Vilo

Classifier

Classifier

AV Scanner

Introduce Vilo as a Classifier

AVAR 2008 (New Delhi)

self learning av product
Self-Learning AV Product

How to get malware collection?

Collect malware detected by the Product.

Solution 1

Vilo

Classifier

Classifier

AVAR 2008 (New Delhi)

self learning av product1
Self-Learning AV Product

Solution 2

How to get malware collection?

Collect and learn in the cloud

Vilo

Internet Cloud

Vilo

Classifier

Classifier

AVAR 2008 (New Delhi)

learning in the cloud
Learning in the Cloud

Solution 2

How to get malware collection?

Collect and learn in the cloud

Internet Cloud

Vilo Learner

Vilo Classifier

Classifier

Classifier

AVAR 2008 (New Delhi)

experience with vilo learning
Experience with Vilo-Learning
  • Vilo-in-the-cloud holds promise
    • Can utilize cluster of workstations
      • Like Google
    • Take advantage of increasing bandwidth and compute power
  • Engineering issues to address
    • Control growth of database
      • Forget samples
      • Use “signature” feature vector(s) for family
      • Be “selective” about features to use

AVAR 2008 (New Delhi)

summary
Summary
  • Weakness of current AV system
    • Exact match over extract
  • Exploited by creating large number of variants
  • Information Retrieval research strengths
    • Inexact match over whole
  • VILO demonstrates IR techniques have promise
  • Architecture of Self-Learning AV System
    • Integrate VILO into existing AV systems
    • Create feedback mechanism to drive learning

AVAR 2008 (New Delhi)

ad