1 / 26

Towards Understanding Network Traffic through Whole Packet Analysis

Towards Understanding Network Traffic through Whole Packet Analysis. Abdulrahman Hijazi Hajime Inoue Ashraf Matrawy P.C. van Oorschot Anil Somayaji. Agenda. Introduction Project in a nutshell ADHIC NetADHICT Overview In progress Results Performance Multimedia & encrypted traffic

iestrada
Download Presentation

Towards Understanding Network Traffic through Whole Packet Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Understanding Network Traffic through Whole Packet Analysis Abdulrahman Hijazi Hajime Inoue Ashraf Matrawy P.C. van Oorschot Anil Somayaji

  2. Agenda • Introduction • Project in a nutshell • ADHIC • NetADHICT • Overview • In progress • Results • Performance • Multimedia & encrypted traffic • P2P • No-headers • Limitations • Applications

  3. Introduction • Complexity of modern computer networks • Common network analysis strategies • Predetermined classifiers (port, address, …) • Protocol dissectors (wireshark, …) • High-level view of network structure through packets clustering • Header information • Payload • Better distinguishes: p2p, worms, … • Performance issue

  4. Introduction • We developed a packet clustering technique that: • finds semantically interesting clusters • adapts to the changing nature of traffic patterns • does not require explicit a priori information • does not rely on any specific fields in the packets • can run in sub-linear time (packets length) • Two innovations: • (p,n)-grams: n-bytes substrings at p byte offset • ADHIC (Approximate Divisive HIerarchical Clustering) • Two key features: • Network traffic redundancy • Optimal clustering is not required

  5. Project in a Nutshell • NetADHICT: our implementation of ADHIC • It can analyze data as it is received by a network interface, or offline using libpcap files. • Observed data is used to generate & update a (p,n)-gram decision tree. • This tree serves as a classifier tree reflecting the high-level structure of network traffic at a given time. • Deduced structure corresponds to • the typical network traffic division (TCP vs. UDP; web vs. non-web), which is • arrived at using automatically generated context related (p,n)-grams.

  6. ADHIC • Using sampled measure of similarity, ADHIC recursively subdivides traffic into binary classes until resulting traffic is: • below certain threshold or • too similar or dissimilar • Produced binary tree consists of: • internal decision nodes with one (p,n)-gram per node • leaf nodes that constitute final clusters • Classification rule is based on matching (p,n)-grams. • Traffic at each terminal cluster is a result of a Boolean equation constructed by following the path from root to leaf.

  7. ADHIC

  8. ADHIC • ADHIC adapts to changing traffic by performing the following two tree operations: • Splitting, when: • a leaf contains more than preset threshold of traffic and • there is a (p,n)-gram that matches a percentage between certain range (e.g. 40%-60%). • Deletion, when: • a subtree has not matched a minimum threshold • Both of these statistics are measured over a preset period of time called: maturation window.

  9. NetADHICT: Overview • Licensed under GNU GPL • It usually starts by separating IP from non-IP, then later in lower nodes it sequesters specific protocols. • NetADHICT segregates packets by protocol and other characteristics (e.g. length). • (p,n)-grams corresponding to special header or payload fields allow unconventional classification measures. • NetADHICT was tested against four week-long traces from our CCSL lab.

  10. NetADHICT: Overview

  11. NetADHICT: Overview

  12. NetADHICT: In progress

  13. NetADHICT: In progress

  14. NetADHICT: In progress

  15. NetADHICT: In progress • Examples of interesting segregation through (p,n)-grams: • (51, 0x00 0x00): part of ARP’s Ethernet frame trailer • (64, 0x00 0x0f): part of EIGRP’s non-IP header • (22, 0x2c 0x06) and (54, 0x01 0x01): part of IMAPS’s TTL & protocol ID and “NOP, NOP” options field respectively • (37, 0xc1 0x0c): HSRP’s 2nd byte of dest port & 1st byte of UDP length • (174, 0x00 0x00): part of NetBIOS-DGM’s payload

  16. Results: Performance Single protocol cluster: clusters that the traditional classifier reports as containing packets of only one protocol.

  17. Results: Performance • NetADHICT does well with most traffic types. • Structured packets (e.g. non-IP, UDP, …) are segregated through header and/or payload (p,n)-grams. • Unstructured packets (e.g. TCP) are more segregated through header (p,n)-grams including fields like the five tuples and others (e.g. packet length, QoS field, TTL, options, padding, …). • NetADHICT also clusters same protocol packets running on different port numbers together (e.g. HTTP on 80 and 8080).

  18. Results: Multimedia & Encrypted Traffic • In addition: multimedia (e.g. MS-Streaming) & encrypted (e.g. SSH, HTTPS, IMAPS) traffic are both: • Segregated from unencrypted traffic: NetADHICT either segregates them through header (p,n)-grams or shunts them to default clusters • Distinguished from each other: NetADHICT finds suitable header (p,n)-grams to separate different encrypted traffic from each other.

  19. Results: P2P • Many P2P applications feature using constantly changing non-standard port numbers in the same network session. • In all the experiments done, NetADHICT was able to: • cluster the P2P UDP tracker packets together through a non-IP-header (p,n)-gram. • cluster all other related TCP packets (data and control) to the tree’s global default cluster and its adjacent cluster. • Even when the running port of all the P2P packets was maliciously changed to the standard HTTP port number (i.e. 80), packets were clustered exactly like before.

  20. Results: P2P

  21. Results: P2P • Two observations: • NetADHICT rarely uses ports to cluster traffic. • NetADHICT managed to segregate P2P traffic by characterizing other network traffic as having patterns that were absent in the P2P traffic. • Conclusion: • So long as most well-behaved traffic can be appropriately clustered, evasive protocols can be identified.

  22. Results: No-Headers • NetADHICT can also do semantically meaningful clustering even without looking at the IP header (first 38 bytes). • Although performance is occasionally degraded, decision trees made with no header information are qualitatively similar to those done using all packet information. • The main difference is in NetADHICT’s inability to separate different encrypted traffic when headers are restricted.

  23. Results: No-Headers

  24. Limitations • Analysis challenge: • Difficulty (work and time) in analyzing clusters both manually and automatically • Privacy issues: • Our algorithm looks at both headers and payloads • Sophisticated design: • Large configuration space, making it difficult to choose an optimal set of parameters

  25. Applications • Network administration: • understand overall structure of network traffic and further assist in monitoring its changes. • Network security: • isolate malicious traffic from normal traffic, (featuring no outdated signatures, long training, or false alarms). • Quality of Service: • actively manage bandwidth by giving each leaf cluster an equal share of the bandwidth. • Other applications: • ADHIC has no built-in knowledge of networking!

  26. Thank you

More Related