Self learning anti virus scanner
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Self-Learning Anti-Virus Scanner PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

Self-Learning Anti-Virus Scanner. Arun Lakhotia , Professor Andrew Walenstein , Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL. Introduction. Alumni in AV Industry Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar McAfee AVERT

Download Presentation

Self-Learning Anti-Virus Scanner

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Self-Learning Anti-Virus Scanner

ArunLakhotia, Professor

Andrew Walenstein, Assistant ProfessorUniversity of Louisiana at Lafayette

www.cacs.louisiana.edu/labs/SRL

2008 AVAR (New Delhi)


Introduction

  • Alumni in AV Industry

  • Prabhat Singh

  • Nitin Jyoti

  • Aditya Kapoor

  • Rachit Kumar McAfee AVERT

  • Erik Uday Kumar,Authentium

  • Moinuddin Mohammed,Microsoft

  • Prashant Pathak, Ex-Symantec

  • Funded by: Louisiana Governor’s IT Initiative

  • Director, Software Research Lab

  • Lab’s focus: Malware Analysis

  • Graduate level course on Malware Analysis

  • Six years of AV related research

  • Issues investigated:

  • Metamorphism

  • Obfuscation

AVAR 2008 (New Delhi)


Outline

  • Attack of Variants

    • AV vulnerability: Exact match

  • Information Retrieval Techniques

    • Inexact match

  • Adapting IR to AV

    • Account for code permutation

  • Vilo: System using IR for AV

  • Integrating Vilo into AV Infrastructure

  • Self-Learning AV using Vilo

2008 AVAR (New Delhi)


ATTACK OF VARIANTS

2008 AVAR (New Delhi)


Variants vs Family

Source: Symantec Internet Threat Report, XI

AVAR 2008 (New Delhi)


Analysis of attacker strategy

  • Purpose of attack of variants

    • Denial of Service on AV infrastructure

    • Increase odds of passing through

  • Weakness exploited

    • AV system use: Exact match over extract

  • Attack strategy

    • Generate just enough variation to beat exact match

  • Attacker cost

    • Cost of generating and distributing variants

2008 AVAR (New Delhi)


Analyzing attacker cost

  • Payload creation is expensive

    • Must reuse payload

  • Need thousands of variants

    • Must be automated

  • “General” transformers are expensive

    • Specialized, limited transformers

      • Hence packers/unpackers

2008 AVAR (New Delhi)


Attacker vulnerability

  • Automated transformers

    • Limited capability

    • Machine generated, must have regular pattern

  • Exploiting attacker vulnerability

    • Detect patterns of similarities

    • Approach

      • Information Retrieval (this presentation)

      • Markov Analysis (other work)

2008 AVAR (New Delhi)


Information Retrieval

2008 AVAR (New Delhi)


IR Basics

  • Basis of Google, Bioinformatics

  • Organizing very large corpus of data

  • Key idea

    • Inexact match over whole

    • Contrast with AV

      • Exact match over extract

2008 AVAR (New Delhi)


IR Problem

Document Collection

IR

Related documents

Query: Keywords orDocument

AVAR 2008 (New Delhi)


IR Steps

Step 1: Convert documents to vectors

1a. Define a method to identify “features”

Example: k-consecutive words

1b. Extract all features from all documents

Have you wondered

When is a rose a rose?

1c. Count features, make feature vector

Have you wondered

1

You wondered when

1

Wondered when rose

1

When rose rose

1

[1, 1, 1, 1, 0,0]

How about onions

0

Onion smell stinks

0

AVAR 2008 (New Delhi)


IR Steps

  • Step 2: Compute feature vectors

    • Take into account features in entire corpus

    • Classical method

      • W=TF x IDF

DF = # documents containing the feature

IDF = Inverse of DF

TF = Term Frequency

TF(v1)

DF

w1 = TFxIDF(v1)

IDF

You wondered when

5

1

1/5

1/5

Wondered when rose

7

2

1/7

2/7

When rose rose

5

8

5/8

1/8

How about onions

6

3

1/6

3/6

Onion smell stinks

3

0

1/3

0/3

AVAR 2008 (New Delhi)


IR Steps

  • Step 3: Compare vectors

    • Cosine similarity

w1 = [0.33, =0.25, 0.66, 0.50]

w1 = [0.33, =0.25, 0.66, 0.50]

2008 AVAR (New Delhi)


IR Steps

Document Collection

  • Step 4: Document Ranking

    • Using similarity measure

Matching document

0.30

0.82

0.90

0.76

IR

New Document

AVAR 2008 (New Delhi)


Adapting IR for AV

AVAR 2008 (New Delhi)


l2D2: pushecx

push4

popecx

pushecx

l2D7:roledx, 8

movdl, al

anddl, 3Fh

shreax, 6

loopl2D7

popecx

calls319

xchgeax, edx

stosd

xchgeax, edx

inc[ebp+v4]

cmp[ebp+v4], 12h

jnzshort l305

l2D2: pushecx

push4

popecx

pushecx

l2D7:roledx, 8

movdl, al

anddl, 3Fh

shreax, 6

loopl2D7

popecx

calls319

xchgeax, edx

stosd

xchgeax, edx

inc[ebp+v4]

cmp[ebp+v4], 12h

jnzshort l305

push

push

pop

push

rol

mov

and

shr

loop

pop

call

xchg

stosd

xchg

inc

cmp

jnz

l144: pushecx

push4

popecx

pushecx

l149:movdl, al

anddl, 3Fh

roledx, 8

shrebx, 6

loopl149

popecx

calls52F

xchgebx, edx

stosd

xchgebx, edx

inc[ebp+v4]

cmp[ebp+v4], 12h

jnzshort l18

l144: pushecx

push4

popecx

pushecx

l149:movdl, al

anddl, 3Fh

roledx, 8

shrebx, 6

loopl149

popecx

calls52F

xchgebx, edx

stosd

xchgebx, edx

inc[ebp+v4]

cmp[ebp+v4], 12h

jnzshort l18

push

push

pop

push

mov

and

rol

shr

loop

pop

call

xchg

stosd

xchg

inc

cmp

jnz

Adapting IR for AV

Step 0: Mapping program to document

Extract Sequence of operations

2008 AVAR (New Delhi)


P

P

O

P

R

M

A

S

L

O

C

X

S

X

I

C

J

P

P

O

P

M

A

R

S

L

O

C

X

S

X

I

C

J

Virus 1

Virus 2

Adapting IR for AV

Step 1a: Defining features k-perm

P P O P R M A S L O C X S X I C J

P P O P S L O C X S X I C J

M A

R

Feature = Permutation of k operations

2008 AVAR (New Delhi)


Virus 1

Virus 2

P

P O P

M A R S L

O C X S X

I C J

P

P O P

M A R S L

O C X S X

I C J

Adapting IR for AV

Step 1 Example of 3-perm

P P O P R M A S L O C X S X I C J

P O P

Virus 3

P

P O P

M A R S L

O C X S X

I C J

AVAR 2008 (New Delhi)


1

P O PR M A S L

2

P O PM A R S L

M A R S L P O P

3

MARS

PMAR

MARS

PMAR

0

0

1

0

0

1

1

0

0

Adapting IR for AV

Step 2: Construct feature vectors (4-perms)

AVAR 2008 (New Delhi)


Adapting IR for AV

  • Step 3: Compare vectors

    • Cosine similarity (as before)

  • Step 4: Match new sample

AVAR 2008 (New Delhi)


Vilo: System using IR for AV

AVAR 2008 (New Delhi)


Vilo Functional View

Malware Collection

Malware Match

0.90

0.82

0.76

0.30

Vilo

New Sample

AVAR 2008 (New Delhi)


Vilo in Action: Query Match

AVAR 2008 (New Delhi)


Vilo: Performance

Response time vs Database size

Search on generic desktop:

In Seconds

Contrast with

Behavior match: In Minutes

Graph match: In Minutes

AVAR 2008 (New Delhi)


Vilo Match Accuracy

ROC Curve: True Positive vs False Positive

True Positive

False Positive

AVAR 2008 (New Delhi)


Vilo in AV Product

AVAR 2008 (New Delhi)


Vilo in AV Product

AV Systems: Composed of classifiers

Classifier

Classifier

Classifier

Vilo

Classifier

Classifier

AV Scanner

Introduce Vilo as a Classifier

AVAR 2008 (New Delhi)


Self-Learning AV Product

How to get malware collection?

Collect malware detected by the Product.

Solution 1

Vilo

Classifier

Classifier

AVAR 2008 (New Delhi)


Self-Learning AV Product

Solution 2

How to get malware collection?

Collect and learn in the cloud

Vilo

Internet Cloud

Vilo

Classifier

Classifier

AVAR 2008 (New Delhi)


Learning in the Cloud

Solution 2

How to get malware collection?

Collect and learn in the cloud

Internet Cloud

Vilo Learner

Vilo Classifier

Classifier

Classifier

AVAR 2008 (New Delhi)


Experience with Vilo-Learning

  • Vilo-in-the-cloud holds promise

    • Can utilize cluster of workstations

      • Like Google

    • Take advantage of increasing bandwidth and compute power

  • Engineering issues to address

    • Control growth of database

      • Forget samples

      • Use “signature” feature vector(s) for family

      • Be “selective” about features to use

AVAR 2008 (New Delhi)


Summary

  • Weakness of current AV system

    • Exact match over extract

  • Exploited by creating large number of variants

  • Information Retrieval research strengths

    • Inexact match over whole

  • VILO demonstrates IR techniques have promise

  • Architecture of Self-Learning AV System

    • Integrate VILO into existing AV systems

    • Create feedback mechanism to drive learning

AVAR 2008 (New Delhi)


  • Login