Internet traffic classification kiss
Download
1 / 50

Internet Traffic Classification KISS - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Internet Traffic Classification KISS. Dario Bonfiglio, Alessandro Finamore, Marco Mellia , Michela Meo, Dario Rossi. Traffic Classification & Measurement. Why ? Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering … How?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Internet Traffic Classification KISS' - kathie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Internet traffic classification kiss

Internet Traffic ClassificationKISS

Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi


Traffic classification measurement
Traffic Classification & Measurement

  • Why?

    • Identify normal and anomalous behavior

    • Characterize the network and its users

    • Quality of service

    • Filtering

  • How?

    • By means of passive measurement

    • UsingTstat


Tstat

http://tstat.tlc.polito.it

External Servers

Internal

Clients

Edge

Router

Tstat

  • Traffic classifier

    • Deep packet inspection

    • Statistical methods

  • Persistent and scalable monitoring platform

    • Round Robin Database (RRD)

    • Histograms



Worm and viruses
Worm and Viruses?

Did someone open a Christmas card? Happy new year to Windows!!


Anomalies good
Anomalies (Good!)

Spammer Disappear

McColoSpamNet shut off on

Tuesday, November 11th, 2008


New applications p2ptv
New Applications – P2PTV

Fiorentina 4 - Udinese 2

Inter 1 - Juventus 0


Traffic classification

Look at the packets…

Internet

Service

Provider

Tell me what protocol

and/or application

generated them


It fails more and more:

P2P

Encryption

Proprietary solution

Many different flavours

Typical approach:

Deep Packet Inspection (DPI)

Skype

Bittorrent

?

?

Port:

Port:

Internet

Service

Provider

?

Payload: “bittorrent”

Payload:

Gtalk

eMule

?

?

Port:

Port: 4662/4672

Payload:

Payload: E4/E5

RTP protocol


The failure of dpi
The Failure of DPI

11.05.2008 12:29 eMule 0.49a released

1.08.2008 20:25 eMule 0.49b released


Possible Solution: Behavioral Classifier

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

(Operation)

(Training)

Statistical characterization of traffic (given source)

Look for the behaviour of unknown traffic and assign the class that better fits it

Check for possible classification mistakes


OurApproach

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

  • Statistical characterization of bits in a flow

  • Do NOT look at the SEMANTIC and TIMING

  • … but rather look at the protocol FORMAT

  • Test

c

2


Chunking and

Expected

distribution

(uniform)

Observed

distribution

UDP header

First N payload bytes

C chunks

Each of

b bits

Vector of Statistics

c

c

c

c

2

2

2

2

[

]

The provides an implicit measure of

entropy or randomness

, … ,

1

C


Consider a chunk of 2 bits:

and different beaviour

Random

Values

Deterministic

Value

Counter

Oi

0 1 2 3

0 1 2 3

0 1 2 3


x

x

x

x

4 bit long chunks: evolution

random

c

2


Deterministic

0

0

0

1

4 bit long chunks: evolution

random

c

2


x

0

0

0

x

0

x

0

0

x

x

x

4 bit long chunks: evolution

deterministic

mixed

random

c

2


Chi square classifier
Chi Square Classifier

  • Split the payload into groups

  • Apply the test on the groups at the flow end: each message is a sample

  • Some groups will contain

    • Random bits

    • Mixed bits

    • Deterministic bits

0 8 16 24

---------------------

| ID | FUNC |

---------------------



2 byte long counter

MSG

L2

L1

LSG

Most

Significant

Group

Less

Significant

Group

And the counter example?


Protocol format asseenfrom the

c

2


OurApproach

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

  • Statistical characterization of bits in a flow

  • Test

  • Decision process

  • Minimum distance / maximum likelihood

c

2


C-dimensionspace

[

]

, … ,

j

1

C

Iperspace

Class

Classification

Regions

?

My Point

Class

Euclidean

Distance

Support

Vector

Machine

c

c

c

c

2

2

2

2

i



Euclidean distance classifier
Euclidean Distance Classifier

j

  • Centroid

  • Center of mass

c

c

2

2

i


Euclidean Distance Classifier

j

  • Centroid

  • Center of mass

True Negative

Are “Far”

True Positives

Are “Nearby”

c

c

2

2

i


Euclidean Distance Classifier

j

  • Centroid

  • Center of mass

False Positives

  • Iper-sphere

c

c

2

2

i


Euclidean Distance Classifier

j

  • Centroid

  • Center of mass

  • Iper-sphere

False negatives

  • Radius

c

c

2

2

i


Euclidean Distance Classifier

j

  • Centroid

  • Center of mass

  • Iper-sphere

  • min { False Pos. }

  • min { False Neg. }

  • Confidence

  • The distance is a measure of the condifence of the decision

c

c

2

2

i


How to define the sphere radius
How to define the sphere radius?

True Positive – False positive

Radius


Support vector machine
Support Vector Machine

  • Kernel functions

  • Move point so that borders

  • are simple

Space of

samples

(dim. C)

Kernelfunction

Space of

feature

(dim. ∞)


Support Vector Machine

Support vectors

  • Kernel functions

  • Move point so that borders

  • are simple

  • Borders are planes

  • Simple surface!

  • Nice math

  • Support Vectors

  • LibSVM

Support vectors


Support Vector Machine

  • Kernel functions

  • Borders are planes

p (  class )

  • Simple surface!

  • Nice math

  • Support Vectors

  • LibSVM

  • Decision

  • Distance from the border

  • Confidence is aprobability


OurApproach

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

  • Statistical characterization of bits in a flow

  • Test

  • Decision process

  • Minimum distance / maximum likelihood

c

2

  • Performance evaluation

  • How accurate is all this?


Per flow and per endpoint
Per flow and per endpoint

  • What are we going to classify?

    • It can be applied to both single flows

    • And to endpoints

  • It is robust to sampling

    • Does not require to monitor all packets, not the first packets


Realtraffictraces

Internet

Trace

1 day long trace

RTP

eMule

DNS

other

Other Unknown

Traffic

Oracle

(DPI +

Manual )

20 GByte di

UDP traffic

Training

  • Known + Other

False Negatives

  • Known Traffic

False Positives

  • Unknown traffic

Fastweb


Definition of false positive negative
Definition of false positive/negative

DNS

Traffic

Oracle (DPI)

eMule

RTP

Other

Classifing “known”

Classifing “other”

KISS

KISS

true negatives

true positives

false negatives

false positives


Results

Euclidean Distance

SVM

Known traffic

(False Neg.)

[%]

Other

(False Pos.)

[%]


Real traffic trace
Real traffic trace

RTP errors are oracle mistakes

(do not identify RTP v1)

DNS errors are due to impure training set

(for the oracle all port 53 is DNS traffic)

EDK errors are (maybe) Xbox Live

(proper training for “other”)

FN are always

below 3%!!!


Tuning trainset size
Tuning trainset size

True positives

Small training set

For “known”: 70-80 Mbyte

For “other”: 300 Mbyte

%

(confidence 5%)

False positives

Samples per class


Tuning num of packets for
Tuning num of packets for

True positives

Protocols with volumes

at least 70-80 pkts per flow

%

False positives

(confidence 5%)

c

2

packets


P2P-TV applications

  • P2P-TV applications are becoming popular

  • They heavly rely on UDP at the transport protocol

  • They are based on proprietary protocols

  • They are evolving over time very quickly

  • How to identify them?

  • ... After 6 hours, KISS give you results




Chunking and

Expected

distribution

(uniform)

Observed

distribution

TCP

UDP

First N payload bytes

C chunks

Each of

b bits

Vector of Statistics

c

c

c

c

2

2

2

2

[

]

The provides an implicit measure of

entropy or randomness

, … ,

1

C




Pros and cons
Pros and Cons

  • KISS is good because…

  • Blind approach

  • Completely automated

  • Works with many protocols

  • Works even with small training

  • Statistics can start at any point

  • Robust w.r.t. packet drops

  • Bypasses some DPI problems

  • but…

  • Learn (other) properly

  • Needs volumes of traffic

  • May require memory (for now)

  • Only UDP (for now)

  • Only offline (for now)


Papers
Papers

  • D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype traffic: when randomness plays with you”, ACM SIGCOMM, Kyoto, JP, August 2007

  • D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, February 2008

  • D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008

  • D. Bonfiglio, M. Mellia, M. Meo, D. Rossi Detailed Analysis of Skype Traffic IEEE Transactions on Multimedia "1", Vol. 11, No. 1, pp. 117-127, ISSN: 1520-9210, January 2009

  • A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st Traffic Monitoring and Analysis (TMA) Workshop Aachen, 11 May 2009



ad