1 / 35

Guofei Gu 1,2 , Roberto Perdisci 3 , Junjie Zhang 1 , and Wenke Lee 1

BotMiner : Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Guofei Gu 1,2 , Roberto Perdisci 3 , Junjie Zhang 1 , and Wenke Lee 1 1 Georgia Tech 3 Damballa, Inc. 2 Texas A&M University. Roadmap. Roadmap. Introduction

auryon
Download Presentation

Guofei Gu 1,2 , Roberto Perdisci 3 , Junjie Zhang 1 , and Wenke Lee 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu1,2, Roberto Perdisci3, Junjie Zhang1, and Wenke Lee1 1Georgia Tech 3Damballa, Inc. 2Texas A&M University

  2. Roadmap Roadmap • Introduction • Botnet problem • Challenges for botnet detection • Related work • BotMiner • Motivation • Design • Evaluation • Conclusion

  3. Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work What Is a Bot/Botnet? • Bot • A malware instance that runs autonomously and automatically on a compromised computer (zombie) without owner’s consent • Profit-driven, professionally written, widely propagated • Botnet (Bot Army): network of bots controlled by criminals • Definition: “A coordinated group of malwareinstances that are controlled by a botmaster via some C&C channel” • Architecture: centralized (e.g., IRC,HTTP), distributed (e.g., P2P) • “25% of Internet PCs are part of a botnet!” ( - Vint Cerf) Botmaster bot C&C

  4. Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Botnets are used for … • All DDoS attacks • Spam • Click fraud • Information theft • Phishing attacks • Distributing other malware, e.g., spyware

  5. Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Challenges for Botnet Detection • Bots are stealthy on the infected machines • We focus on a network-based solution • Bot infection is usually a multi-faceted and multi-phased process • Only looking at one specific aspect likely to fail • Bots are dynamically evolving • Static and signature-based approaches may not be effective • Botnets can have very flexible design of C&C channels • A solution very specific to a botnet instance is not desirable

  6. Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Why Existing Techniques Not Enough? • Traditional AV tools • Bots use packer, rootkit, frequent updating to easily defeat AV tools • Traditional IDS/IPS • Look at only specific aspect • Do not have a big picture • Honeypot • Not a good botnetdetection tool

  7. Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Existing Botnet Detection Work • [Binkley,Singh 2006]: IRC-based bot detection combine IRC statistics and TCP work weight • Rishi [Goebel, Holz 2007]: signature-basedIRC bot nickname detection • [Livadas et al. 2006, Karasaridis et al. 2007]: (BBN, AT&T) network flow level detection of IRC botnets (IRC botnet) • BotHunter [Gu etal Security’07]: dialog correlation to detect bots based on an infection dialog model • BotSniffer [Gu etal NDSS’08]: spatial-temporal correlation to detect centralized botnet C&C • TAMD [Yen, Reiter 2008]: traffic aggregation to detect botnets that use a centralized C&C structure

  8. Introduction BotMiner Conclusion Motivation Design Evaluation Why BotMiner? • Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models … Example: Nugache, Storm, …

  9. Introduction BotMiner Conclusion Motivation Design Evaluation BotMiner: Protocol- and Structure-Independent Detection • Horizontal correlation • - Bots are for long-term use • Botnet: communication and activities are coordinated/similar Enterprise-like Network Internet

  10. Introduction BotMiner Conclusion Motivation Design Evaluation Revisit the Definition of a Botnet • “A coordinated group of malwareinstances that are controlled by a botmaster via some C&C channel” • We need to monitor two planes • C-plane (C&C communication plane): “who is talking to whom” • A-plane (malicious activity plane): “who is doing what”

  11. Introduction BotMiner Conclusion Motivation Design Evaluation BotMiner Architecture

  12. BotMiner C-plane Clustering Introduction BotMiner Conclusion Motivation Design Evaluation • What characterizes a communication flow (C-flow) between a local host and a remote service? • <protocol, srcIP, dstIP, dstPort>

  13. Temporal related statistical distribution information in BPS (bytes per second) FPH (flow per hour) Spatial related statistical distribution information in BPP (bytes per packet) PPF (packet per flow) Introduction BotMiner Conclusion Motivation Design Evaluation How to Capture “Talking in What Kind of Patterns”?

  14. Introduction BotMiner Conclusion Motivation Design Evaluation Two-step Clustering of C-flows • Why multi-step? • How? • Coarse-grained clustering • Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8) • Efficient clustering algorithm: X-means • Fine-grained clustering • Using full feature space (13*4=52) • What’s left?

  15. Introduction BotMiner Conclusion Motivation Design Evaluation A-plane Clustering • Capture “activities in what kind of patterns”

  16. Introduction BotMiner Conclusion Motivation Design Evaluation Cross-plane Correlation • Botnet score s(h) for every host h • Similarity score between host hi and hj • Hierarchical clustering Aj Ai Two hosts in the same A-clusters and in at least one common C-cluster are clustered together

  17. Introduction BotMiner Conclusion Motivation Design Evaluation Evaluation Traces

  18. Introduction BotMiner Conclusion Motivation Design Evaluation Evaluation Results: False Positives

  19. Introduction BotMiner Conclusion Motivation Design Evaluation Evaluation Results: Detection Rate

  20. Introduction BotMiner Conclusion Summary & Future Work Correlation-based Botnet Detection Framework Summary and Future Work • BotMiner • New botnet detection system based on Horizontal correlation • Independent of botnet C&C protocol and structure • Real-world evaluation shows promising results • Future work • More efficient clustering, more robust features • New faster detection system using active techniques • BotMiner: offline correlation, and requires a relatively long time for detection • BotProbe: fast detection by observing at most one round of C&C • New real-time solution for very high speed and very large networks

  21. Introduction BotMiner Conclusion Summary & Future Work Correlation-based Botnet Detection Framework Correlation-based Botnet Detection Framework Vertical Correlation BotHunter (Security’07) Enterprise-like Network Horizontal Correlation BotSniffer (NDSS’08) BotMiner (Security’08) Time Internet Cause-Effect Correlation BotProbe

  22. Appendix Limitation and Discussion • Evading C-plane monitoring and clustering • Misuse whitelist • Manipulate communication patterns • Evading A-plane monitoring and clustering • Very stealthy activity • Individualize bots’ communication/activity • Evading cross-plane analysis • Extremely delayed task

  23. High-Speed Packet Sampling • Traffic arrives at high rates • High volume • Some analysis scales with the size of the input • Possible approaches • Random packet sampling • Targeted packet sampling

  24. Approach • Idea:Bias sampling of traffic towards subpopulations based on conditions of traffic • Two modules • Counting: Count statistics of each traffic flow • Sampling: Sample packets based on (1) overall target sampling rate (2) input conditions Instantaneous sampling probability Overall sampling rate Input conditions Traffic subpopulations Traffic stream Counting Sampling

  25. Challenges • How to specify subpopulations? • Solution: multi-dimensional array specification • How to maintain counts for each subpopulation? • Solution:rotating array of counting Bloom filters • How to derive instantaneous sampling probabilities from overall constraints? • Solution:multi-dimensional counter array, and scaling based on target rates

  26. Specifying Subpopulations • Idea: Use concatenation of header fields (“tupples”) as a “key” for a subpopulation • These keys specify a group of packets that will be counted together Count groups of packets with the same source and destination IP address # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, 1] AND tuple_2 in (0, 5]: 0.5 Count groups of packets with the same source IP, source port, and destination port

  27. Sampling Rates for Subpopulations • Operator specifies • Overall sampling rate • Conditional rate within each class • Flexsample computes instantaneous sampling probabilities based on this # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5 Sample one in 100 packets on average Within the 1/100 “budget”, half of sampled packets should come from groups satisfying this condition

  28. Examining the Condition • Biases sampling towards packets from (source IP, destination IP) pairs which • Have sent at least 30 packets • Have sent packets to at least 5 distinct ports • Application:Portscan # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5

  29. Sampling Lookup Table • Problem: Conditions may not be completely specified • Solution: Sampling budget lookup table • Lookup table for allocating sampling “budget” to each class Deduced values # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5 Next problem: Determining which condition each packet satisfies

  30. Counting Subpopulations • Each packet belongs to a particular range in n-dimensional space • Counts for each condition • Maintain counter (counting Bloom filter) for each tuple in every subcondition • Rotate counters to expunge “stale” values Details:1. Number of counters2. How often to rotate

  31. Deriving Instantaneous Sampling Rates • Problem: Traffic rates are dynamic • Relative fractions of packets in each class may change • Solution: Count packets in each sampling class, and adjust probabilities to rebalance according to the lookup table • Instantaneous rate = overall rate * (target rate) / (actual rate) • Keep track of actual rate using Bloom filter array and EWMA

  32. Example Evaluation: Portscan Setup • Parameters as above • Nmap scan injected into ful one-hour trace from department network Results • FlexSample can capture 10x more of the portscan packets if all sampling budget is allocated to portscan class • Bias can be configured

  33. Other Applications • Recovering unique “conversations” in sampled traffic • Identifying DDoS Attacks • Identifying heavy hiters, high-degree nodes, etc.

  34. Open Challenges • Specifying ranges and classes for specific applications • Scaling the counter array as the number of tuples and ranges increases • Simultaneously satisfying multiple objectives

  35. Next Steps: BotMiner Integration • Determine • The traffic rates that BotMiner can support for online analysis • The subpopulations that will yield the highest detection rates • Evaluation on traffic traces that contain botnets of interest

More Related