1 / 125

Traffic classification and applications to traffic monitoring

Traffic classification and applications to traffic monitoring. Marco Mellia Electronic and Telecommunication Department Politecnico di Torino Email:mellia@tlc.polito.it. Traffic Classification & Measurement. Why ? Identify normal and anomalous behavior

amos
Download Presentation

Traffic classification and applications to traffic monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Traffic classification and applications to traffic monitoring Marco Mellia Electronic and Telecommunication Department Politecnicodi Torino Email:mellia@tlc.polito.it

  2. Traffic Classification & Measurement • Why? • Identify normal and anomalous behavior • Characterize the network and its users • Quality of service • Filtering • … • How? • By means of passivemeasurement

  3. http://tstat.tlc.polito.it External Servers Internal Clients Edge Router Scenario • Traffic classifier • Deep packet inspection • Statistical methods • Persistent and scalable monitoring platform • Round Robin Database (RRD) • Histograms

  4. Tstat at a Glance

  5. Worm and Viruses? Did someone open a Christmas card? Happy new year to Windows!!

  6. Anomalies (Good!) Spammer Disappear McColoSpamNet shut off on Tuesday, November 11th, 2008

  7. New Applications – P2PTV Fiorentina 4 - Udinese 2 Inter 1 - Juventus 0

  8. Megaupload blocked 19/01/12

  9. Howto monitor traffic? • All previous examples rely on the availability of a CLASSIFIER • A tool that can discriminate classes of traffic • Classification: the problem of assigning a class to an observation • The set of classes is pre-defined • The output may be correct or not

  10. Some terminology • Question: Isthis a cat, a rabbit, or a dog?

  11. Some terminology • Question: Isthis a cat, a rabbit, or a dog?

  12. Howtocompute performance? • Confusion matrix • On rows we have the actual class • On columns we have the predicted class • Allows to see if some confusion arises

  13. Howtocompute performance? • Confusionmatrix • True positive • Itwasclassifiedas a cat, and itwas a cat

  14. Howtocompute performance? • Confusionmatrix • False negative • Itwasclassified NOT as a cat, butitwas a cat

  15. Howtocompute performance? • Confusionmatrix • True negative • Itwasclassified NOT as a cat, and itwas NOT a cat

  16. Howtocompute performance? • Confusionmatrix • False positive • Itwasclassifiedas a cat, butitwas NOT a cat

  17. Othermetrics • Accuracy: is the ratio of the sum of all True Positives to the sum of all tests, for all classes. • It is biased toward the most predominant class in a data set. • Consider for example a test to identify patients that suffer from a disease that affects 10 patient over 100 tests. The classifier that always returns ``sane'' will have accuracy of 90%.

  18. Othermetrics • Recallof a class: is the ratio of the True Positives and the sum of True Positives and False Negatives. • Recall(cat)=5/(5+3+0) • It is a measure of the ability of a classifier to select instances of the given class from a data set

  19. Othermetrics • Precision of a class: is the ratio of True Positives and the sum of True Positives and False Positive • Precision(cat) = 5/(5+2+0) • It is a metric that measure how precise is the classifier in labeling only samples of a given class

  20. Traffic classification Look at the packets… Internet Service Provider Tell me what protocol and/or application generated them

  21. Typical approach: Deep Packet Inspection (DPI) Skype Bittorrent ? ? Port: Port: Internet Service Provider ? Payload: “bittorrent” Payload: Gtalk eMule ? ? Port: Port: 4662/4672 Payload: Payload: E4/E5 RTP protocol

  22. The problem of traffic classification • DeepPacketInspection • Based on lookingfor some pre-definedpayloadpatterns, deep in the packet • Simpleat L2-L4 • “if ethertype == 0x0800, then there is an IP packet” • Usually done with a set of if-then-else or even switch-case • Ambiguous at L7 • TCP port 80 does not mean automatically “protocol HTTP”

  23. DPI: Rule-set complexity • Practical rule-sets: • Snort, as of November 2007 • 8536 rules, 5549 Perl Compatible Regular Expressions • OpenDPI as of February 2012 • 118 protocols • Tstat as of February 2012 • Approx 200 classes/services Deep packet inspection Regular expression matching at line rate Finite Automata based techniques =

  24. Some notes... • Protocol identification… • … or application verification? • Skype can use the standard HTTP protocol to exchange data • Is that traffic “Skype” or “HTTP”? • Today everything is going over HTTP • Is it Facebook? Twitter? YouTube video? Or HTTP?

  25. The question Whichgranularity are youinterestedinto ??

  26. Several approaches to traffic classification Traffic classification Content-based Statistical methods Port-based (stateless) Payload-based (stateful) Host social behaviour (e.g., Faloutsos) Traffic statistics (e.g., Salgarelli, Baiocchi, Moore, Mellia) Packet-based (e.g., Spatscheck) Message-based Auto-learning methods (e.g. Bayes) Preclassified bins Protocol behaviour (e.g., BinPac, SML) Pre-computed or auto-learning signatures

  27. Some references

  28. Some references • Some criticaloverview/tutorial • T.T.T.Nguyen, G.Armitage. A survey of techniques for internet traffic classification using machine learning, Communications Surveys & Tutorials, IEEE, V.10, N.4, pp.56 - 76 • H.Kim, KC Claffy, M.Fomenkov, D.Barman, M.Faloutsos, K.Lee. Internet trafficclassificationdemystified: myths, caveats, and the best practices.In Proceedingsof the 2008 ACM CoNEXTConference (CoNEXT '08). ACM, New York, NY, USA, 2008. • Forthisclass: • D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. TofanelliRevealingskypetraffic: whenrandomnessplayswithyouACM SIGCOMM Kyoto, JP, ISBN: 978-1-59593-71, 27 August 2007. • A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: StochasticPacketInspection 1st TMA Workshop, Aachen, 11 May 2009. • A. Finamore, M. Mellia, M. Meo, D. Rossi, KISS: StochasticPacketInspectionClassifierfor UDP Traffic, IEEE/ACM Transactions on Networking "5", Vol. 18, pp. 1505-1015, ISSN: 1063-6692, October 2010. • G. La Mantia, D. Rossi, A. Finamore, M. Mellia, M. Meo, StochasticPacketInspectionfor TCP Traffic, IEEE ICC, Cape Town, South Africa, 23 May 2010.

  29. It fails more and more: P2P Encryption Proprietary solution Many different flavours Typical approach: Deep Packet Inspection (DPI) Skype Bittorrent ? ? Port: Port: Internet Service Provider ? Payload: “bittorrent” Payload: Gtalk eMule ? ? Port: Port: 4662/4672 Payload: Payload: E4/E5 RTP protocol

  30. The Failure of DPI 11.05.2008 12:29 eMule 0.49a released 1.08.2008 20:25 eMule 0.49b released

  31. Possible Solution: Behavioral Classifier Phase 3 Phase 1 Phase 2 Verify Traffic (Known) Feature Decision (Operation) (Training) Statistical characterization of traffic Look for the behaviour of unknown traffic and assign the class that better fits it Check for possible classification mistakes

  32. Behavioural classifiers • Which statistics? • Packet size • Average, std, max, min • Len of first X pkts • IPG • Average, std, max, min • IPG of first X+1 pkts • Total size, duration, #data packets • From client, from server, from both • RTT, #concurrent connection, rtx, dups, … • TCP options, flags, signaling, … • Feature selection? • Which decision process? • Ad Hoc • Bayesian • Neural Networks • Decision trees • SVM • … • Which training set? • Supervised techniques

  33. The case ofSkype Consider a simpleexample

  34. Our Goal • Identify Skype traffic • Motivations • Operators need to know what is running in their network • New business models, provisioning, TE, etc. • Understand user behaviour • Traffic characterization, security • … • It’s fun

  35. Skype Overview No server No well-known port … No standard No RFC State-of-the-Art Encryption/Obfuscation Mechanisms • Skype offers voice, video, chat and data transfer services over IP • Closed design, proprietary solutions • P2P technology • Proprietary protocols • Encrypted communications • Easy to use, difficult to reveal • It is the perfect example of DPI failure

  36. Our Goal • Identify Skype traffic • Voice stream first: both E2E and SkypeOut/In streams • Possible video/chat/file transfers/signaling • Constraints • Passive observation of traffic • Protocol ignorance

  37. Skype? AND Three Classifiers Skype? Payload Based Classifier Traffic Flow Skype? Naïve Bayes Classifier Skype? Chi Square Classifier

  38. Phase 1 – try to understand it

  39. Skype as VoIP Application • Skype selects the voice codec from a list • Low bit rate: 10-32 kbps • Regular Inter-Packet-Gap (30 ms frames) • Redundancy may be added to mitigate packet loss • Framing may be modified from the original codec one • Multiplexes different source into the same message (voice, video, chat,…)

  40. Skype Source Model Skype Message TCP/UDP IP

  41. Skype Header Formats(What we guess about it) Can we design a DPI classifier?

  42. Impossible to exploit. Everything is ciphered Possible Skype Messages • Signaling and data messages • Use TCP, with ciphered payload • Login, lookup, signaling… • Data flow • Use UDP whenever possible: payload is encrypted… but some header MUST be exposed • Question: • Some header MUST be exposed… • Why?!??

  43. Impossible to exploit. Everything is ciphered Source Receiver AES AES Possible Skype Messages • Signaling and data messages • Use TCP, with ciphered payload • Login, lookup, signaling… • Data flow • Use UDP whenever possible: payload is encrypted… but some header MUST be exposed Unreliable

  44. Skype Source Model Skype Message TCP/UDP IP

  45. SoM Format for E2E Messages 0 8 16 24 --------------------- | ID | FUNC | --------------------- Start of Message (SoM) of End2End messages carried by UDP has: • ID: 16 bits long random identifier • FUNC: 5 bits long function (multiplexing?), obfuscated in a Byte

  46. Voice Video Chat File Function Values • 0x01 = ??Query message • 0x02 = ??Query • 0x0d = Data • 0x07 = NAK

  47. Classic signature based classifier PBC • SoM can be used to identifySkype flows carried by UDP • 5bits long signature • Question: • Which is the chance that you have a false positive?

  48. Classic signature based classifier PBC • SoM can be used to identifySkype flows carried by UDP • 5bits long signature • Question: • Which is the chance that you have a false positive? • We look for 1 string out of 32 possible strings • 1/32 of false detection possible • Can we improve it by checking multiple packets • Yet we can have a very high false positive rate

  49. PBC • Any other smart way of improving accuracy? • Hint: this is UDP

  50. Classic signature based classifier PBC • SoM can be used to identifySkype flows carried by UDP • 5bits long signature

More Related