260 likes | 266 Views
Towards Understanding Network Traffic through Whole Packet Analysis. Abdulrahman Hijazi Hajime Inoue Ashraf Matrawy P.C. van Oorschot Anil Somayaji. Agenda. Introduction Project in a nutshell ADHIC NetADHICT Overview In progress Results Performance Multimedia & encrypted traffic
E N D
Towards Understanding Network Traffic through Whole Packet Analysis Abdulrahman Hijazi Hajime Inoue Ashraf Matrawy P.C. van Oorschot Anil Somayaji
Agenda • Introduction • Project in a nutshell • ADHIC • NetADHICT • Overview • In progress • Results • Performance • Multimedia & encrypted traffic • P2P • No-headers • Limitations • Applications
Introduction • Complexity of modern computer networks • Common network analysis strategies • Predetermined classifiers (port, address, …) • Protocol dissectors (wireshark, …) • High-level view of network structure through packets clustering • Header information • Payload • Better distinguishes: p2p, worms, … • Performance issue
Introduction • We developed a packet clustering technique that: • finds semantically interesting clusters • adapts to the changing nature of traffic patterns • does not require explicit a priori information • does not rely on any specific fields in the packets • can run in sub-linear time (packets length) • Two innovations: • (p,n)-grams: n-bytes substrings at p byte offset • ADHIC (Approximate Divisive HIerarchical Clustering) • Two key features: • Network traffic redundancy • Optimal clustering is not required
Project in a Nutshell • NetADHICT: our implementation of ADHIC • It can analyze data as it is received by a network interface, or offline using libpcap files. • Observed data is used to generate & update a (p,n)-gram decision tree. • This tree serves as a classifier tree reflecting the high-level structure of network traffic at a given time. • Deduced structure corresponds to • the typical network traffic division (TCP vs. UDP; web vs. non-web), which is • arrived at using automatically generated context related (p,n)-grams.
ADHIC • Using sampled measure of similarity, ADHIC recursively subdivides traffic into binary classes until resulting traffic is: • below certain threshold or • too similar or dissimilar • Produced binary tree consists of: • internal decision nodes with one (p,n)-gram per node • leaf nodes that constitute final clusters • Classification rule is based on matching (p,n)-grams. • Traffic at each terminal cluster is a result of a Boolean equation constructed by following the path from root to leaf.
ADHIC • ADHIC adapts to changing traffic by performing the following two tree operations: • Splitting, when: • a leaf contains more than preset threshold of traffic and • there is a (p,n)-gram that matches a percentage between certain range (e.g. 40%-60%). • Deletion, when: • a subtree has not matched a minimum threshold • Both of these statistics are measured over a preset period of time called: maturation window.
NetADHICT: Overview • Licensed under GNU GPL • It usually starts by separating IP from non-IP, then later in lower nodes it sequesters specific protocols. • NetADHICT segregates packets by protocol and other characteristics (e.g. length). • (p,n)-grams corresponding to special header or payload fields allow unconventional classification measures. • NetADHICT was tested against four week-long traces from our CCSL lab.
NetADHICT: In progress • Examples of interesting segregation through (p,n)-grams: • (51, 0x00 0x00): part of ARP’s Ethernet frame trailer • (64, 0x00 0x0f): part of EIGRP’s non-IP header • (22, 0x2c 0x06) and (54, 0x01 0x01): part of IMAPS’s TTL & protocol ID and “NOP, NOP” options field respectively • (37, 0xc1 0x0c): HSRP’s 2nd byte of dest port & 1st byte of UDP length • (174, 0x00 0x00): part of NetBIOS-DGM’s payload
Results: Performance Single protocol cluster: clusters that the traditional classifier reports as containing packets of only one protocol.
Results: Performance • NetADHICT does well with most traffic types. • Structured packets (e.g. non-IP, UDP, …) are segregated through header and/or payload (p,n)-grams. • Unstructured packets (e.g. TCP) are more segregated through header (p,n)-grams including fields like the five tuples and others (e.g. packet length, QoS field, TTL, options, padding, …). • NetADHICT also clusters same protocol packets running on different port numbers together (e.g. HTTP on 80 and 8080).
Results: Multimedia & Encrypted Traffic • In addition: multimedia (e.g. MS-Streaming) & encrypted (e.g. SSH, HTTPS, IMAPS) traffic are both: • Segregated from unencrypted traffic: NetADHICT either segregates them through header (p,n)-grams or shunts them to default clusters • Distinguished from each other: NetADHICT finds suitable header (p,n)-grams to separate different encrypted traffic from each other.
Results: P2P • Many P2P applications feature using constantly changing non-standard port numbers in the same network session. • In all the experiments done, NetADHICT was able to: • cluster the P2P UDP tracker packets together through a non-IP-header (p,n)-gram. • cluster all other related TCP packets (data and control) to the tree’s global default cluster and its adjacent cluster. • Even when the running port of all the P2P packets was maliciously changed to the standard HTTP port number (i.e. 80), packets were clustered exactly like before.
Results: P2P • Two observations: • NetADHICT rarely uses ports to cluster traffic. • NetADHICT managed to segregate P2P traffic by characterizing other network traffic as having patterns that were absent in the P2P traffic. • Conclusion: • So long as most well-behaved traffic can be appropriately clustered, evasive protocols can be identified.
Results: No-Headers • NetADHICT can also do semantically meaningful clustering even without looking at the IP header (first 38 bytes). • Although performance is occasionally degraded, decision trees made with no header information are qualitatively similar to those done using all packet information. • The main difference is in NetADHICT’s inability to separate different encrypted traffic when headers are restricted.
Limitations • Analysis challenge: • Difficulty (work and time) in analyzing clusters both manually and automatically • Privacy issues: • Our algorithm looks at both headers and payloads • Sophisticated design: • Large configuration space, making it difficult to choose an optimal set of parameters
Applications • Network administration: • understand overall structure of network traffic and further assist in monitoring its changes. • Network security: • isolate malicious traffic from normal traffic, (featuring no outdated signatures, long training, or false alarms). • Quality of Service: • actively manage bandwidth by giving each leaf cluster an equal share of the bandwidth. • Other applications: • ADHIC has no built-in knowledge of networking!