1 / 31

Automatically Inferring Patterns of Resource Consumption in Network Traffic

Automatically Inferring Patterns of Resource Consumption in Network Traffic. Cristian Estan, Stefan Savage, George Varghese University of California, San Diego. Who is using my link?. Looking at the traffic. Too much data for a human. Do something smarter!. Src. IP. Dest. IP. Dest. IP.

traci
Download Presentation

Automatically Inferring Patterns of Resource Consumption in Network Traffic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatically Inferring Patterns of Resource Consumption in Network Traffic Cristian Estan, Stefan Savage, George Varghese University of California, San Diego

  2. Who is using my link? Traffic Clusters - 2003

  3. Looking at the traffic Too much data for a human Do something smarter! Traffic Clusters - 2003

  4. Src. IP Dest. IP Dest. IP Source port Protocol Src. port Dest. port Src. net Dest. net Dest. net Looking at traffic aggregates • Aggregating on individual packet header fields gives useful results but • Traffic reports are not always at the right granularity (e.g. individual IP address, subnet, etc.) • Cannot show aggregates defined over multiple fields (e.g. which network uses which application) • The traffic analysis tool should automatically find aggregates over the right fields at the right granularity Which network uses web and which one kazaa? Where does the traffic come from? …… What apps are used? Most traffic goes to the dorms … Traffic Clusters - 2003

  5. Ideal traffic report Web is the dominant application This is a Denial of Service attack !! The library is a heavy user of web That’s a big flash crowd! This paper is about giving the network administratorinsightfultraffic reports Traffic Clusters - 2003

  6. Contributions of this paper • Approach • Definitions • Algorithms • System • Experience Traffic Clusters - 2003

  7. Approach • Characterize traffic mix by describing all important traffic aggregates • Multidimensional aggregates (e.g. flash crowd described by protocol, port number and IP address) • Aggregates at the the right level of granularity (e.g. computer, subnet, ISP) • Traffic analysis is automated– finds insightful data without human guidance Traffic Clusters - 2003

  8. Definition: traffic clusters • Traffic clustersare the multidimensional traffic aggregates identified by our reports • A cluster is defined by a range for each field • The ranges are from natural hierarchies (e.g. IP prefix hierarchy) – meaningful aggregates • Example • Traffic aggregate: incoming web traffic for CS Dept. • Traffic cluster: ( SrcIP=*, DestIP in 132.239.64.0/21, Proto=TCP, SrcPort=80, DestPort in [1024,65535] ) Traffic Clusters - 2003

  9. Definition: traffic report • Traffic reports give the volume of chosen traffic clusters • To keep report size manageable describe only clusters above threshold (e.g. H=total of traffic/20) • To avoid redundant data compress by omitting clusters whose traffic can be inferred (up to error H) from non-overlapping more specific clusters in the report • To highlight non-obvious aggregates prioritize by using unexpectedness label • Example • 50% of all traffic is web • Prefix B receives 20% of all traffic • The web traffic received by prefix B is 15% instead of 50%*20%=10%, unexpectedness label is 15%/10%=150% Traffic Clusters - 2003

  10. Contributions of this paper • Approach • Definitions • Algorithms • System • Experience Traffic Clusters - 2003

  11. Algorithms and theory • Algorithms and theoretical bounds in the paper • Unidimensional reports are easy to compute • Multidimensional reports are exponentially harder as we add more fields • Next few slides • Example of unidimensional compression • Example for the structure of the multidimensional cluster space Traffic Clusters - 2003

  12. 500 500 10.0.0.0/28 10.0.0.0/29 10.0.0.8/29 120 120 380 380 10.0.0.0/30 10.0.0.4/30 10.0.0.8/30 50 70 305 305 75 10.0.0.10/31 10.0.0.2/31 10.0.0.4/31 10.0.0.8/31 50 70 270 270 35 75 160 110 Unidimensional report example Threshold=100 Hierarchy 10.0.0.12/30 10.0.0.14/31 40 35 15 35 30 160 110 75 10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.8 10.0.0.9 10.0.0.10 10.0.0.14 Traffic Clusters - 2003

  13. 500 10.0.0.0/28 120 380 305 10.0.0.8/30 270 10.0.0.8/31 160 110 Unidimensional report example Compression 380-270≥100 120 380 10.0.0.0/29 10.0.0.8/29 305-270<100 160 110 10.0.0.8 10.0.0.9 Traffic Clusters - 2003

  14. Source net Application All traffic All traffic US EU Web Mail CA NY GB DE US Web Multidimensional structure ex. Nodes (clusters) have multiple parents Nodes (clusters) overlap US CA Web Traffic Clusters - 2003

  15. Contributions of this paper • Approach • Definitions • Algorithms • System • Experience Traffic Clusters - 2003

  16. names categories System: AutoFocus Cluster miner Web based GUI Grapher Traffic parser Packet header trace Traffic Clusters - 2003

  17. Traffic Clusters - 2003

  18. Traffic Clusters - 2003

  19. Traffic Clusters - 2003

  20. Contributions of this paper • Approach • Definitions • Algorithms • System • Experience Traffic Clusters - 2003

  21. Structure of regular traffic mix • Backups from CAIDA to tape server • Semi-regular time pattern • FTP from SLAC Stanford • Scripps web traffic • Web & Squid servers • Large ssh traffic • Steady ICMP probing from CAIDA SD-NAP SD-NAP Traffic Clusters - 2003

  22. Analysis of unusual events • UCSD to UCLA route change • Sapphire/SQL Slammer worm Site 2 Traffic Clusters - 2003

  23. Conclusions 1010111101010000101011111101011001010101101011010000101010100101010111101010101000101111010000010111111101011001010111010111100100101010100011011111100010101110110101100101010110101111000010101011110111010111010101010111111010110010101011010101111101010000110100001011010100101011001000000101011001010101011111000010001000010101011110101000010111001010101101011110000010101011111101011000101111010000010111110101011010111100100101010110010101010001010100101010110101010010111001010000010100001110110101010110111111000101011101011101011001010101101011110000110111101110101110101010101111110101100101010110101111011101010000110101010010101101010111010101001010000101011010101001010100000101010101010101101011101010100000010101010101101010101011110101110101011010100011000101010010111010101001101010100001000110101111010100010110 Traffic Clusters - 2003

  24. Conclusions • Multidimensional traffic clusters using natural hierarchies describe traffic aggregates • Traffic reports using thresholding identify automatically conspicuous resource consumption at the right granularity • Compression produces compact traffic reports and unexpectedness labels highlight non-obvious aggregates • Our prototype system, AutoFocus, provides insights into the structure of regular traffic and unexpected events Traffic Clusters - 2003

  25. Thank you! Alpha version of AutoFocus downloadable from http://ial.ucsd.edu/AutoFocus/ Any questions? Acknowledgements: NIST, NSF, Vern Paxson, David Moore, Liliana Estan, Jennifer Rexford, Alex Snoeren, Geoff Voelker Traffic Clusters - 2003

  26. Bounds and running times Traffic Clusters - 2003

  27. Open questions • Are there tighter bounds for the size of the reports? • Are there algorithms that produce smaller results? • Are there algorithms that compute traffic reports more efficiently? In streaming fashion? Traffic Clusters - 2003

  28. Delta reports • Why repeat the same traffic report if the traffic doesn’t change from one day to the other? • Delta reports describe the clusters that increased or decreased by more than the threshold from one interval to the other • On related traffic mixes delta reports much smaller than traffic reports • Multidimensional compression very hard for delta reports • We have only exponential algorithm for the cluster delta Traffic Clusters - 2003

  29. Greedy compression algorithm Traffic Clusters - 2003

  30. Multidimensional report example Thresholding Compression Traffic Clusters - 2003

  31. System details Traffic Clusters - 2003

More Related