Self learning anti virus scanner
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Self-Learning Anti-Virus Scanner PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Self-Learning Anti-Virus Scanner. Arun Lakhotia , Professor Andrew Walenstein , Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL. Introduction. Alumni in AV Industry Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar McAfee AVERT

Download Presentation

Self-Learning Anti-Virus Scanner

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Self learning anti virus scanner

Self-Learning Anti-Virus Scanner

ArunLakhotia, Professor

Andrew Walenstein, Assistant ProfessorUniversity of Louisiana at Lafayette

www.cacs.louisiana.edu/labs/SRL

2008 AVAR (New Delhi)


Introduction

Introduction

  • Alumni in AV Industry

  • Prabhat Singh

  • Nitin Jyoti

  • Aditya Kapoor

  • Rachit Kumar McAfee AVERT

  • Erik Uday Kumar,Authentium

  • Moinuddin Mohammed,Microsoft

  • Prashant Pathak, Ex-Symantec

  • Funded by: Louisiana Governor’s IT Initiative

  • Director, Software Research Lab

  • Lab’s focus: Malware Analysis

  • Graduate level course on Malware Analysis

  • Six years of AV related research

  • Issues investigated:

  • Metamorphism

  • Obfuscation

AVAR 2008 (New Delhi)


Outline

Outline

  • Attack of Variants

    • AV vulnerability: Exact match

  • Information Retrieval Techniques

    • Inexact match

  • Adapting IR to AV

    • Account for code permutation

  • Vilo: System using IR for AV

  • Integrating Vilo into AV Infrastructure

  • Self-Learning AV using Vilo

2008 AVAR (New Delhi)


Attack of variants

ATTACK OF VARIANTS

2008 AVAR (New Delhi)


Variants vs family

Variants vs Family

Source: Symantec Internet Threat Report, XI

AVAR 2008 (New Delhi)


Analysis of attacker strategy

Analysis of attacker strategy

  • Purpose of attack of variants

    • Denial of Service on AV infrastructure

    • Increase odds of passing through

  • Weakness exploited

    • AV system use: Exact match over extract

  • Attack strategy

    • Generate just enough variation to beat exact match

  • Attacker cost

    • Cost of generating and distributing variants

2008 AVAR (New Delhi)


Analyzing attacker cost

Analyzing attacker cost

  • Payload creation is expensive

    • Must reuse payload

  • Need thousands of variants

    • Must be automated

  • “General” transformers are expensive

    • Specialized, limited transformers

      • Hence packers/unpackers

2008 AVAR (New Delhi)


Attacker vulnerability

Attacker vulnerability

  • Automated transformers

    • Limited capability

    • Machine generated, must have regular pattern

  • Exploiting attacker vulnerability

    • Detect patterns of similarities

    • Approach

      • Information Retrieval (this presentation)

      • Markov Analysis (other work)

2008 AVAR (New Delhi)


Information retrieval

Information Retrieval

2008 AVAR (New Delhi)


Ir basics

IR Basics

  • Basis of Google, Bioinformatics

  • Organizing very large corpus of data

  • Key idea

    • Inexact match over whole

    • Contrast with AV

      • Exact match over extract

2008 AVAR (New Delhi)


Ir problem

IR Problem

Document Collection

IR

Related documents

Query: Keywords orDocument

AVAR 2008 (New Delhi)


Ir steps

IR Steps

Step 1: Convert documents to vectors

1a. Define a method to identify “features”

Example: k-consecutive words

1b. Extract all features from all documents

Have you wondered

When is a rose a rose?

1c. Count features, make feature vector

Have you wondered

1

You wondered when

1

Wondered when rose

1

When rose rose

1

[1, 1, 1, 1, 0,0]

How about onions

0

Onion smell stinks

0

AVAR 2008 (New Delhi)


Ir steps1

IR Steps

  • Step 2: Compute feature vectors

    • Take into account features in entire corpus

    • Classical method

      • W=TF x IDF

DF = # documents containing the feature

IDF = Inverse of DF

TF = Term Frequency

TF(v1)

DF

w1 = TFxIDF(v1)

IDF

You wondered when

5

1

1/5

1/5

Wondered when rose

7

2

1/7

2/7

When rose rose

5

8

5/8

1/8

How about onions

6

3

1/6

3/6

Onion smell stinks

3

0

1/3

0/3

AVAR 2008 (New Delhi)


Ir steps2

IR Steps

  • Step 3: Compare vectors

    • Cosine similarity

w1 = [0.33, =0.25, 0.66, 0.50]

w1 = [0.33, =0.25, 0.66, 0.50]

2008 AVAR (New Delhi)


Ir steps3

IR Steps

Document Collection

  • Step 4: Document Ranking

    • Using similarity measure

Matching document

0.30

0.82

0.90

0.76

IR

New Document

AVAR 2008 (New Delhi)


Adapting ir for av

Adapting IR for AV

AVAR 2008 (New Delhi)


Adapting ir for av1

l2D2: pushecx

push4

popecx

pushecx

l2D7:roledx, 8

movdl, al

anddl, 3Fh

shreax, 6

loopl2D7

popecx

calls319

xchgeax, edx

stosd

xchgeax, edx

inc[ebp+v4]

cmp[ebp+v4], 12h

jnzshort l305

l2D2: pushecx

push4

popecx

pushecx

l2D7:roledx, 8

movdl, al

anddl, 3Fh

shreax, 6

loopl2D7

popecx

calls319

xchgeax, edx

stosd

xchgeax, edx

inc[ebp+v4]

cmp[ebp+v4], 12h

jnzshort l305

push

push

pop

push

rol

mov

and

shr

loop

pop

call

xchg

stosd

xchg

inc

cmp

jnz

l144: pushecx

push4

popecx

pushecx

l149:movdl, al

anddl, 3Fh

roledx, 8

shrebx, 6

loopl149

popecx

calls52F

xchgebx, edx

stosd

xchgebx, edx

inc[ebp+v4]

cmp[ebp+v4], 12h

jnzshort l18

l144: pushecx

push4

popecx

pushecx

l149:movdl, al

anddl, 3Fh

roledx, 8

shrebx, 6

loopl149

popecx

calls52F

xchgebx, edx

stosd

xchgebx, edx

inc[ebp+v4]

cmp[ebp+v4], 12h

jnzshort l18

push

push

pop

push

mov

and

rol

shr

loop

pop

call

xchg

stosd

xchg

inc

cmp

jnz

Adapting IR for AV

Step 0: Mapping program to document

Extract Sequence of operations

2008 AVAR (New Delhi)


Adapting ir for av2

P

P

O

P

R

M

A

S

L

O

C

X

S

X

I

C

J

P

P

O

P

M

A

R

S

L

O

C

X

S

X

I

C

J

Virus 1

Virus 2

Adapting IR for AV

Step 1a: Defining features k-perm

P P O P R M A S L O C X S X I C J

P P O P S L O C X S X I C J

M A

R

Feature = Permutation of k operations

2008 AVAR (New Delhi)


Adapting ir for av3

Virus 1

Virus 2

P

P O P

M A R S L

O C X S X

I C J

P

P O P

M A R S L

O C X S X

I C J

Adapting IR for AV

Step 1 Example of 3-perm

P P O P R M A S L O C X S X I C J

P O P

Virus 3

P

P O P

M A R S L

O C X S X

I C J

AVAR 2008 (New Delhi)


Adapting ir for av4

1

P O PR M A S L

2

P O PM A R S L

M A R S L P O P

3

MARS

PMAR

MARS

PMAR

0

0

1

0

0

1

1

0

0

Adapting IR for AV

Step 2: Construct feature vectors (4-perms)

AVAR 2008 (New Delhi)


Adapting ir for av5

Adapting IR for AV

  • Step 3: Compare vectors

    • Cosine similarity (as before)

  • Step 4: Match new sample

AVAR 2008 (New Delhi)


Vilo system using ir for av

Vilo: System using IR for AV

AVAR 2008 (New Delhi)


Vilo functional view

Vilo Functional View

Malware Collection

Malware Match

0.90

0.82

0.76

0.30

Vilo

New Sample

AVAR 2008 (New Delhi)


Vilo in action query match

Vilo in Action: Query Match

AVAR 2008 (New Delhi)


Vilo performance

Vilo: Performance

Response time vs Database size

Search on generic desktop:

In Seconds

Contrast with

Behavior match: In Minutes

Graph match: In Minutes

AVAR 2008 (New Delhi)


Vilo match accuracy

Vilo Match Accuracy

ROC Curve: True Positive vs False Positive

True Positive

False Positive

AVAR 2008 (New Delhi)


Vilo in av product

Vilo in AV Product

AVAR 2008 (New Delhi)


Vilo in av product1

Vilo in AV Product

AV Systems: Composed of classifiers

Classifier

Classifier

Classifier

Vilo

Classifier

Classifier

AV Scanner

Introduce Vilo as a Classifier

AVAR 2008 (New Delhi)


Self learning av product

Self-Learning AV Product

How to get malware collection?

Collect malware detected by the Product.

Solution 1

Vilo

Classifier

Classifier

AVAR 2008 (New Delhi)


Self learning av product1

Self-Learning AV Product

Solution 2

How to get malware collection?

Collect and learn in the cloud

Vilo

Internet Cloud

Vilo

Classifier

Classifier

AVAR 2008 (New Delhi)


Learning in the cloud

Learning in the Cloud

Solution 2

How to get malware collection?

Collect and learn in the cloud

Internet Cloud

Vilo Learner

Vilo Classifier

Classifier

Classifier

AVAR 2008 (New Delhi)


Experience with vilo learning

Experience with Vilo-Learning

  • Vilo-in-the-cloud holds promise

    • Can utilize cluster of workstations

      • Like Google

    • Take advantage of increasing bandwidth and compute power

  • Engineering issues to address

    • Control growth of database

      • Forget samples

      • Use “signature” feature vector(s) for family

      • Be “selective” about features to use

AVAR 2008 (New Delhi)


Summary

Summary

  • Weakness of current AV system

    • Exact match over extract

  • Exploited by creating large number of variants

  • Information Retrieval research strengths

    • Inexact match over whole

  • VILO demonstrates IR techniques have promise

  • Architecture of Self-Learning AV System

    • Integrate VILO into existing AV systems

    • Create feedback mechanism to drive learning

AVAR 2008 (New Delhi)


  • Login