Traffic classification and applications to traffic monitoring
Download
1 / 125

Traffic classification and applications to traffic monitoring - PowerPoint PPT Presentation


  • 170 Views
  • Uploaded on

Traffic classification and applications to traffic monitoring. Marco Mellia Electronic and Telecommunication Department Politecnico di Torino Email:mellia@tlc.polito.it. Traffic Classification & Measurement. Why ? Identify normal and anomalous behavior

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Traffic classification and applications to traffic monitoring' - amos


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Traffic classification and applications to traffic monitoring

Traffic classification and applications to traffic monitoring

Marco Mellia

Electronic and Telecommunication Department

Politecnicodi Torino

Email:mellia@tlc.polito.it


Traffic classification measurement
Traffic Classification & Measurement monitoring

  • Why?

    • Identify normal and anomalous behavior

    • Characterize the network and its users

    • Quality of service

    • Filtering

  • How?

    • By means of passivemeasurement


Scenario

http://tstat.tlc.polito.it monitoring

External Servers

Internal

Clients

Edge

Router

Scenario

  • Traffic classifier

    • Deep packet inspection

    • Statistical methods

  • Persistent and scalable monitoring platform

    • Round Robin Database (RRD)

    • Histograms


Tstat at a glance
Tstat at a Glance monitoring


Worm and viruses
Worm and Viruses? monitoring

Did someone open a Christmas card? Happy new year to Windows!!


Anomalies good
Anomalies (Good!) monitoring

Spammer Disappear

McColoSpamNet shut off on

Tuesday, November 11th, 2008


New applications p2ptv
New Applications – P2PTV monitoring

Fiorentina 4 - Udinese 2

Inter 1 - Juventus 0


Megaupload blocked 19 01 12
Megaupload monitoring blocked 19/01/12


How to monitor traffic
How monitoringto monitor traffic?

  • All previous examples rely on the availability of a CLASSIFIER

    • A tool that can discriminate classes of traffic

  • Classification: the problem of assigning a class to an observation

    • The set of classes is pre-defined

    • The output may be correct or not


Some terminology
Some monitoringterminology

  • Question: Isthis a cat, a rabbit, or a dog?


Some terminology1
Some monitoringterminology

  • Question: Isthis a cat, a rabbit, or a dog?


How to compute performance
How monitoringtocompute performance?

  • Confusion matrix

    • On rows we have the actual class

    • On columns we have the predicted class

  • Allows to see if some confusion arises


How to compute performance1
How monitoringtocompute performance?

  • Confusionmatrix

  • True positive

    • Itwasclassifiedas a cat, and itwas a cat


How to compute performance2
How monitoringtocompute performance?

  • Confusionmatrix

  • False negative

    • Itwasclassified NOT as a cat, butitwas a cat


How to compute performance3
How monitoringtocompute performance?

  • Confusionmatrix

  • True negative

    • Itwasclassified NOT as a cat, and itwas NOT a cat


How to compute performance4
How monitoringtocompute performance?

  • Confusionmatrix

  • False positive

    • Itwasclassifiedas a cat, butitwas NOT a cat


Other metrics
Other monitoringmetrics

  • Accuracy: is the ratio of the sum of all True Positives to the sum of all tests, for all classes.

  • It is biased toward the most predominant class in a data set.

    • Consider for example a test to identify patients that suffer from a disease that affects 10 patient over 100 tests. The classifier that always returns ``sane'' will have accuracy of 90%.


Other metrics1
Other monitoringmetrics

  • Recallof a class: is the ratio of the True Positives and the sum of True Positives and False Negatives.

    • Recall(cat)=5/(5+3+0)

    • It is a measure of the ability of a classifier to select instances of the given class from a data set


Other metrics2
Other monitoringmetrics

  • Precision of a class: is the ratio of True Positives and the sum of True Positives and False Positive

    • Precision(cat) = 5/(5+2+0)

    • It is a metric that measure how precise is the classifier in labeling only samples of a given class


Traffic classification and applications to traffic monitoring

Traffic classification monitoring

Look at the packets…

Internet

Service

Provider

Tell me what protocol

and/or application

generated them


Traffic classification and applications to traffic monitoring

Typical approach: monitoring

Deep Packet Inspection (DPI)

Skype

Bittorrent

?

?

Port:

Port:

Internet

Service

Provider

?

Payload: “bittorrent”

Payload:

Gtalk

eMule

?

?

Port:

Port: 4662/4672

Payload:

Payload: E4/E5

RTP protocol


The problem of traffic classification
The problem of traffic classification monitoring

  • DeepPacketInspection

    • Based on lookingfor some pre-definedpayloadpatterns, deep in the packet

  • Simpleat L2-L4

    • “if ethertype == 0x0800, then there is an IP packet”

    • Usually done with a set of if-then-else or even switch-case

  • Ambiguous at L7

    • TCP port 80 does not mean automatically “protocol HTTP”


Dpi rule set complexity
DPI: Rule-set complexity monitoring

  • Practical rule-sets:

    • Snort, as of November 2007

      • 8536 rules, 5549 Perl Compatible Regular Expressions

    • OpenDPI as of February 2012

      • 118 protocols

    • Tstat as of February 2012

      • Approx 200 classes/services

Deep packet inspection

Regular expression matching at line rate

Finite Automata based techniques

=


Some notes
Some notes... monitoring

  • Protocol identification…

  • … or application verification?

    • Skype can use the standard HTTP protocol to exchange data

    • Is that traffic “Skype” or “HTTP”?

  • Today everything is going over HTTP

    • Is it Facebook? Twitter? YouTube video? Or HTTP?


The question
The monitoringquestion

Whichgranularity are youinterestedinto ??


Several approaches to traffic classification
Several approaches to traffic classification monitoring

Traffic classification

Content-based

Statistical methods

Port-based

(stateless)

Payload-based

(stateful)

Host social behaviour

(e.g., Faloutsos)

Traffic statistics

(e.g., Salgarelli, Baiocchi, Moore, Mellia)

Packet-based

(e.g., Spatscheck)

Message-based

Auto-learning methods (e.g. Bayes)

Preclassified bins

Protocol behaviour

(e.g., BinPac, SML)

Pre-computed or auto-learning signatures


Some references
Some monitoringreferences


Some references1
Some monitoringreferences

  • Some criticaloverview/tutorial

    • T.T.T.Nguyen, G.Armitage. A survey of techniques for internet traffic classification using machine learning, Communications Surveys & Tutorials, IEEE, V.10, N.4, pp.56 - 76

    • H.Kim, KC Claffy, M.Fomenkov, D.Barman, M.Faloutsos, K.Lee. Internet trafficclassificationdemystified: myths, caveats, and the best practices.In Proceedingsof the 2008 ACM CoNEXTConference (CoNEXT '08). ACM, New York, NY, USA, 2008.

  • Forthisclass:

    • D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. TofanelliRevealingskypetraffic: whenrandomnessplayswithyouACM SIGCOMM Kyoto, JP, ISBN: 978-1-59593-71, 27 August 2007.

    • A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: StochasticPacketInspection 1st TMA Workshop, Aachen, 11 May 2009.

    • A. Finamore, M. Mellia, M. Meo, D. Rossi, KISS: StochasticPacketInspectionClassifierfor UDP Traffic, IEEE/ACM Transactions on Networking "5", Vol. 18, pp. 1505-1015, ISSN: 1063-6692, October 2010.

    • G. La Mantia, D. Rossi, A. Finamore, M. Mellia, M. Meo, StochasticPacketInspectionfor TCP Traffic, IEEE ICC, Cape Town, South Africa, 23 May 2010.


Traffic classification and applications to traffic monitoring

It fails more and more: monitoring

P2P

Encryption

Proprietary solution

Many different flavours

Typical approach:

Deep Packet Inspection (DPI)

Skype

Bittorrent

?

?

Port:

Port:

Internet

Service

Provider

?

Payload: “bittorrent”

Payload:

Gtalk

eMule

?

?

Port:

Port: 4662/4672

Payload:

Payload: E4/E5

RTP protocol


The failure of dpi
The Failure of DPI monitoring

11.05.2008 12:29 eMule 0.49a released

1.08.2008 20:25 eMule 0.49b released


Traffic classification and applications to traffic monitoring

Possible Solution: Behavioral Classifier monitoring

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

(Operation)

(Training)

Statistical characterization of traffic

Look for the behaviour of unknown traffic and assign the class that better fits it

Check for possible classification mistakes


Behavioural classifiers
Behavioural classifiers monitoring

  • Which statistics?

    • Packet size

      • Average, std, max, min

      • Len of first X pkts

    • IPG

      • Average, std, max, min

      • IPG of first X+1 pkts

    • Total size, duration, #data packets

    • From client, from server, from both

    • RTT, #concurrent connection, rtx, dups, …

    • TCP options, flags, signaling, …

  • Feature selection?

  • Which decision process?

    • Ad Hoc

    • Bayesian

    • Neural Networks

    • Decision trees

    • SVM

  • Which training set?

    • Supervised techniques


The case of skype

The case monitoringofSkype

Consider a simpleexample


Our goal
Our Goal monitoring

  • Identify Skype traffic

  • Motivations

    • Operators need to know what is running in their network

      • New business models, provisioning, TE, etc.

    • Understand user behaviour

    • Traffic characterization, security

    • It’s fun


Skype overview
Skype Overview monitoring

No server

No well-known port

No standard

No RFC

State-of-the-Art

Encryption/Obfuscation

Mechanisms

  • Skype offers voice, video, chat and data transfer services over IP

  • Closed design, proprietary solutions

    • P2P technology

    • Proprietary protocols

    • Encrypted communications

  • Easy to use, difficult to reveal

    • It is the perfect example of DPI failure


Our goal1
Our Goal monitoring

  • Identify Skype traffic

    • Voice stream first: both E2E and SkypeOut/In streams

    • Possible video/chat/file transfers/signaling

  • Constraints

    • Passive observation of traffic

    • Protocol ignorance


Three classifiers

Skype? monitoring

AND

Three Classifiers

Skype?

Payload Based Classifier

Traffic

Flow

Skype?

Naïve Bayes Classifier

Skype?

Chi Square Classifier



Skype as voip application
Skype as VoIP Application monitoring

  • Skype selects the voice codec from a list

    • Low bit rate: 10-32 kbps

    • Regular Inter-Packet-Gap (30 ms frames)

  • Redundancy may be added to mitigate packet loss

  • Framing may be modified from the original codec one

  • Multiplexes different source into the same message (voice, video, chat,…)


Skype source model
Skype Source Model monitoring

Skype

Message

TCP/UDP

IP


Skype header formats what we guess about it

Skype Header Formats monitoring(What we guess about it)

Can we design a DPI classifier?


Possible skype messages

Impossible monitoring

to exploit.

Everything is ciphered

Possible Skype Messages

  • Signaling and data messages

    • Use TCP, with ciphered payload

      • Login, lookup, signaling…

      • Data flow

    • Use UDP whenever possible: payload is encrypted… but some header MUST be exposed

  • Question:

    • Some header MUST be exposed…

    • Why?!??


Possible skype messages1

Impossible monitoring

to exploit.

Everything is ciphered

Source

Receiver

AES

AES

Possible Skype Messages

  • Signaling and data messages

    • Use TCP, with ciphered payload

      • Login, lookup, signaling…

      • Data flow

    • Use UDP whenever possible: payload is encrypted… but some header MUST be exposed

Unreliable


Skype source model1
Skype Source Model monitoring

Skype

Message

TCP/UDP

IP


Som format for e2e messages
SoM Format for E2E Messages monitoring

0 8 16 24

---------------------

| ID | FUNC |

---------------------

Start of Message (SoM) of End2End messages carried by UDP has:

  • ID: 16 bits long random identifier

  • FUNC: 5 bits long function (multiplexing?), obfuscated in a Byte


Function values

Voice monitoring

Video

Chat

File

Function Values

  • 0x01 = ??Query message

  • 0x02 = ??Query

  • 0x0d = Data

  • 0x07 = NAK


Traffic classification and applications to traffic monitoring

Classic signature based classifier monitoring

PBC

  • SoM can be used to identifySkype flows carried by UDP

    • 5bits long signature

  • Question:

    • Which is the chance that you have a false positive?


Traffic classification and applications to traffic monitoring

Classic signature based classifier monitoring

PBC

  • SoM can be used to identifySkype flows carried by UDP

    • 5bits long signature

  • Question:

    • Which is the chance that you have a false positive?

  • We look for 1 string out of 32 possible strings

    • 1/32 of false detection possible

  • Can we improve it by checking multiple packets

    • Yet we can have a very high false positive rate


Traffic classification and applications to traffic monitoring
PBC monitoring

  • Any other smart way of improving accuracy?

    • Hint: this is UDP


Traffic classification and applications to traffic monitoring

Classic signature based classifier monitoring

PBC

  • SoM can be used to identifySkype flows carried by UDP

    • 5bits long signature


Traffic classification and applications to traffic monitoring

Classic signature based classifier monitoring

PBC

  • SoM can be used to identifySkype flows carried by UDP

    • 5bits long signature

  • IMPROVE: Identify Skype socket address at clients

    • The UDP port is FIXED and not random (as in TCP)

    • Then, look for Skype flows with the same UDP port

  • It works

    • with UDP only

    • at edge node only

  • Complex

  • Cannot discriminate VOICE/VIDEO/CHAT/DATA


Skype encrypts traffic

Skype Encrypts Traffic monitoring

Can we leverage this?


Skype source model2
Skype Source Model monitoring

Skype

Message

TCP/UDP

IP


Randomness classifier
Randomness Classifier monitoring

  • Skype encrypts traffic

    • payload looks like random

  • Some headers are constant (FUNC)

  • Apply randomness test to the payload bits

    • Chi-Square test:statistic test for random sequences


Traffic classification and applications to traffic monitoring
CSC monitoring

  • Split the payload into groups

  • Apply the test on the values assumed at each group

    • Each message is an observation

  • Some groups will contain

    • Random bits

    • Mixed bits

    • Deterministic bits

0 8 16 24

---------------------

| ID | FUNC |

---------------------


Traffic classification and applications to traffic monitoring

Set a threshold monitoring

CSC


Skype is a voip application

Skype is a VoIP Application monitoring

Which are the features that make it different from a bulk download?


Skype source model3
Skype Source Model monitoring

Skype

Message

TCP/UDP

IP


Which features
Which monitoringfeatures?

  • Question: Which features would you select to differentiate a VoIP stream from a data download?


Sample trace
Sample Trace monitoring

Regular

IPG

Small/regular

packets


Naive baysean classifier
Naive monitoringBayseanClassifier

  • Simple classifier: based on the a-prioriprob, evaluate the a-posterioriprob

    • How similar is this flow to a Skype voice flow?

  • What makes “VoIP” traffic different from other traffic?

    • Packet size, i.e., small packets (packet NBC)

    • Inter-Packet-Gap, i.e., small IPG (IPG NBC)


Design of the behavioral classifier
Design monitoringof the behavioralclassifier

  • We consider windows of N packets

  • In each window, we check the IPG and the payloadsize distribution

  • We compare against the expected distribution and compute a belief

    • One belief for each “mode”

  • After K windows, take a decision

    • Consider the most likely mode in each windows

    • Average over all K windows to get the average max belief

    • Compare it against a threshold


Skype naive bayes classifier

Packet NBC monitoring

Packet NBC

Packet NBC

IPG NBC

IPG NBC

IPG NBC``

Skype Naive Bayes Classifier

W(k+1)

W(k+2)

W(k+3)

W(k+4)

W(k)

X

E[Bs(,j)]

AVG

max

Bs(k,j)

B

min

Y

max

AVG

E[Bt]

Bt(k)


Nbc over time

Set a threshold monitoring

NBC over Time


Results

Results monitoring


Three classifiers1

Skype? monitoring

AND

Three Classifiers

Skype?

Payload Based Classifier

UDP

benchmark

dataset

Traffic

Flow

Skype?

Naïve Bayes Classifier

Skype?

Chi Square Classifier


Scenario1
Scenario monitoring

  • Testbed traces: 100% accuracy

  • Campus LAN@polito

    • Simple scenario, no P2P, no VoIP

  • Italian ISP

    • Stiff scenario: lot of P2P, tons of VoIP

  • Results consider

    • True positive (OK): Skype, and identified

    • False positive (FP): Not Skype, but identified

    • False negative (FN): Skype, but discarded


Performance evaluation udp
Performance Evaluation: UDP monitoring

Payload Based Classifier

Naïve Based Classifier

Chi Square Classifier

NBC + CSC



Nbc threshold impact

E2E monitoring

NBC Threshold Impact

5

E2E

4

3

FP [%]

2

1

0

-30

-25

-20

-15

-10

-5

0

100

80

60

FN [%]

40

20

0

-30

-25

-20

-15

-10

-5

0

Minimum Belief


Kiss chi square stocastich classifier or stocastic packet inspection

Kiss: chi square stocastich classifier or monitoringStocastic Packet Inspection

Generalizeit


Traffic classification and applications to traffic monitoring

Our monitoringApproach

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

  • Statistical characterization of bits in a flow

  • Do NOT look at the SEMANTIC and TIMING

  • … but rather look at the protocol FORMAT

  • Test

c

2


Question which protocol is this
Question monitoring: Whichprotocolisthis?

0

4

8

16

19

24

32

CONSTANT

Source Port

Destination Port

CONSTANT

COUNTER

Sequence Number

COUNTER

Acknowledgment Number

CONSTANT

CONSTANT

RANDOM

HLEN

Resv

Control flag

Window

Checksum

Urgent Pointer

RANDOM

CONSTANT

RANDOM

Options

Padding


Traffic classification and applications to traffic monitoring
CSC monitoring

  • Split the payload into groups

  • Apply the test on the values assumed at each group

    • Each message is an observation

  • Some groups will contain

    • Random bits

    • Mixed bits

    • Deterministic bits

0 8 16 24

---------------------

| ID | FUNC |

---------------------


Traffic classification and applications to traffic monitoring

Chunking and monitoring

Expected

distribution

(uniform)

Observed

distribution

UDP header

First N payload bytes

C chunks

Each of

b bits

Vector of Statistics

c

c

c

c

2

2

2

2

[

]

The provides an implicit measure of

entropy or randomness

, … ,

1

C


Traffic classification and applications to traffic monitoring

Consider a chunk of 2 bits: monitoring

and different beaviour

Random

Values

Deterministic

Value

Counter

Ei

Oi

0 1 2 3

0 1 2 3

0 1 2 3



Traffic classification and applications to traffic monitoring

x monitoring

x

x

x

4 bit long chunks: evolution

random

c

2


Traffic classification and applications to traffic monitoring

Deterministic monitoring

0

0

0

1

4 bit long chunks: evolution

random

c

2


Traffic classification and applications to traffic monitoring

x monitoring

0

0

0

x

0

x

0

0

x

x

x

4 bit long chunks: evolution

deterministic

mixed

Question: isitdependent on WHICH bits are fixed

And WHICH are random???

random

c

2


Traffic classification and applications to traffic monitoring
KISS monitoring


Traffic classification and applications to traffic monitoring

2 byte long counter monitoring

MSG

L2

L1

LSG

Most

Significant

Group

Less

Significant

Group

Question: What is this?!?!


Traffic classification and applications to traffic monitoring

Protocol monitoring format asseenfrom the

c

2


Traffic classification and applications to traffic monitoring

Our monitoringApproach

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

  • Statistical characterization of bits in a flow

  • Test

  • Decision process

  • Minimum distance / maximum likelihood

c

2


Traffic classification and applications to traffic monitoring

C-dimension monitoringspace

[

]

, … ,

j

1

C

Iperspace

Class

Classification

Regions

?

My Point

Class

Euclidean

Distance

Support

Vector

Machine

c

c

c

c

2

2

2

2

i



Euclidean distance classifier
Euclidean Distance Classifier monitoring

j

  • Centroid

  • Center of mass

c

c

2

2

i


Traffic classification and applications to traffic monitoring

Euclidean Distance Classifier monitoring

j

  • Centroid

  • Center of mass

True Negative

Are “Far”

True Positives

Are “Nearby”

c

c

2

2

i


Traffic classification and applications to traffic monitoring

Euclidean Distance Classifier monitoring

j

  • Centroid

  • Center of mass

False Positives

  • Iper-sphere

c

c

2

2

i


Traffic classification and applications to traffic monitoring

Euclidean Distance Classifier monitoring

j

  • Centroid

  • Center of mass

  • Iper-sphere

False negatives

  • Radius

c

c

2

2

i


Traffic classification and applications to traffic monitoring

Euclidean Distance Classifier monitoring

j

  • Centroid

  • Center of mass

  • Iper-sphere

  • max { True Pos. }

  • min { False Neg. }

  • Confidence

  • The distance is a measure of the condifence of the decision

c

c

2

2

i


How to define the sphere radius
How to define the sphere radius? monitoring

True Positive – False positive

Radius


Support vector machine
Support Vector Machine monitoring

  • Kernel functions

  • Move point so that borders

  • are simple

Space of

samples

(dim. C)

Kernelfunction

Space of

feature

(dim. ∞)


Traffic classification and applications to traffic monitoring

Support Vector Machine monitoring

Support vectors

  • Kernel functions

  • Move point so that borders

  • are simple

  • Borders are planes

  • Simple surface!

  • Nice math

  • Support Vectors

  • LibSVM

Support vectors


Traffic classification and applications to traffic monitoring

Support Vector Machine monitoring

  • Kernel functions

  • Borders are planes

p (  class )

  • Simple surface!

  • Nice math

  • Support Vectors

  • LibSVM

  • Decision

  • Distance from the border

  • Confidence is aprobability


Traffic classification and applications to traffic monitoring

Our monitoringApproach

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

  • Statistical characterization of bits in a flow

  • Test

  • Decision process

  • Minimum distance / maximum likelihood

c

2

  • Performance evaluation

  • How accurate is all this?


Per flow and per endpoint
Per flow and per endpoint monitoring

  • What are we going to classify?

    • It can be applied to both single flows

    • And to endpoints

  • Question:

    • Do we assume to monitor ALL packets?

    • Do we assume to monitor since the first packet?


Per flow and per endpoint1
Per flow and per endpoint monitoring

  • What are we going to classify?

    • It can be applied to both single flows

    • And to endpoints

  • Question:

    • Do we assume to monitor ALL packets?

    • Do we assume to monitor since the FIRST packet?

  • NO!

    • It is robust to sampling

    • It can start from any point



Traffic classification and applications to traffic monitoring

Real monitoringtraffictraces

Internet

Trace

1 day long trace

RTP

eMule

DNS

other

Other Unknown

Traffic

Oracle

(DPI +

Manual )

20 GByte di

UDP traffic

Training

  • Known + Other

False Negatives

  • Known Traffic

False Positives

  • Unknown traffic

Fastweb


Definition of false positive negative
Definition of false positive/negative monitoring

DNS

Traffic

Oracle (DPI)

eMule

RTP

Other

Classifing “known”

Classifing “other”

KISS

KISS

true negatives

true positives

false negatives

false positives


Traffic classification and applications to traffic monitoring

Results monitoring

Euclidean Distance

SVM

Known traffic

(False Neg.)

[%]

Other

(False Pos.)

[%]


Tuning trainset size
Tuning trainset size monitoring

True positives

Small training set

For “known”: 70-80 Mbyte

For “other”: 300 Mbyte

%

False positives

Samples per class


Tuning num of packets for
Tuning Num. of Packets for monitoring

True positives

Protocols with volumes

at least 70-80 pkts per flow

%

False positives

(confidence 5%)

c

2

packets


Real traffic trace
Real traffic trace monitoring

RTP errors are oracle mistakes

(do not identify RTP v1)

DNS errors are due to impure training set

(for the oracle all port 53 is DNS traffic)

EDKerrors are (maybe) Xbox Live

(proper training for “other”)

FN are always

below 3%!!!


Traffic classification and applications to traffic monitoring

P2P-TV applications monitoring

  • P2P-TV applications are becoming popular

  • They heavly rely on UDP at the transport protocol

  • They are based on proprietary protocols

  • They are evolving over time very quickly

  • How to identify them?

  • ... After 6 hours, KISS give you results


Putting all together
Putting all together monitoring

  • Now with

    • 9 classes

    • 3 different networks



Traffic classification and applications to traffic monitoring

Abacus: Rationale monitoring

  • Applications are like people in a party room

    • Some prefer brief exchanges with many other people

    • Some likes long talks with few other people

  • “Attitudes” are different across P2P applications...

    • Some prefer to download small pieces of data from many peers

    • Some prefer to download all data from almost the same peers

  • ... enough to classify them

    • Observe a host for a given time

    • Count the number of peers contacted and the number of packets exchanged which represent the attitude


Traffic classification and applications to traffic monitoring

Abacus signature definition monitoring

X

  • Consider a host X which in a fixed time-window ΔT = 5s is contacted by N=5 peers Yi

  • for each peer Yi count the number of packets sent to X in ΔT

  • Consider a set of bins of exponential width

  • Divide the peers in bins according to the number of exchanged packets

  • Normalize the bins, i.e. divide for the total number of peers N

  • The final signature is an empirical probability distribution function

  • In the example

    • N=5, bins = (1, 0, 2, 2)‏

    • Abacus signature (0.2, 0, 0.4, 0.4)‏

Y1

Y2

Y3

Y4

Y5

...

1

2

3-4

5-8

9-16


Traffic classification and applications to traffic monitoring

Signature comparison monitoring

PPLive

TVAnts

Joost

SopCast


Traffic classification and applications to traffic monitoring

Our monitoringApproach

Phase 3

Phase 1

Phase 2

Verify

Traffic

(Known)

Feature

Decision

  • Statistical characterization of bits in a flow

    • ABACUS signatures

  • Decision process

  • Supervised machine learning based on SVM

  • Performance evaluation

  • How accurate is all this?


Traffic classification and applications to traffic monitoring

Rejection criterion monitoring

Labeled as

“unknown”

  • Hyper-space is partitioned

    • every point is given a label

    • even “unknown” apps

  • Need a way to recognize them

    • Define a center for each class

    • Define a threshold R

    • Calculate the distance d between the point and the center of the assigned class

    • If d > R mark the new point as unknown

  • Bhattacharyya distance BD

    • Distance between p.d.f.

Labeled as

“green”

R

R

Center of the class

Training points

New points



Traffic classification and applications to traffic monitoring

For R=0.5 monitoring

high TPR low FPR

For R~1

high TPR

high FPR

For R~0

low TPR

low FPR

Experimental results

For “unknown traffic” the selection of the rejection threshold R is fundamental!


The failure of dpi1
The Failure of DPI monitoring


Question you should ask yourself
Question monitoringyoushouldaskyourself

  • Which feature to use?

    • Are those “portable”?

    • How often should the classifier be retrained?

    • Can it take a decision after some packets?

  • Open issues

    • Which is a good training set?

    • How to get a valid benchmarking set?

      • Never trust other classifiers!

    • How to make it work at 100Gb/s?

    • How much traffic can it actually classify (coverage)?



Interesting research topics
Interesting monitoringresearchtopics

  • And for TCP?

  • And for HTTP?

    • HowtogetFacebook or Twitter

    • Over HTTPS?

    • Whenservedby the same CDN?

  • And for the application?

    • Isit a video seen on Facebook?

  • Can I getridof the training set?

    • Useunsupervisedclassifiers?


Ops wrong key

Ops, wrong key monitoring


And for tcp

And for TCP? monitoring


Traffic classification and applications to traffic monitoring

Chunking and monitoring

Expected

distribution

(uniform)

Observed

distribution

TCP

UDP

First N payload bytes

C chunks

Each of

b bits

Vector of Statistics

c

c

c

c

2

2

2

2

[

]

The provides an implicit measure of

entropy or randomness

, … ,

1

C


Results1
Results monitoring


Results2
Results monitoring