1 / 29

Botnet and Spam Detection in High-Speed Networks

Botnet and Spam Detection in High-Speed Networks. Wenke Lee and Nick Feamster Georgia Tech. Overview. Problem: Botnet and Spam Detection in high-speed networks Common theme: Examine network-level properties and build classifier Two systems: BotMiner and SNARE Overview

linda-reese
Download Presentation

Botnet and Spam Detection in High-Speed Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Botnet and Spam Detection in High-Speed Networks Wenke Lee and Nick FeamsterGeorgia Tech

  2. Overview Problem: Botnet and Spam Detection in high-speed networks Common theme: Examine network-level properties and build classifier Two systems: BotMiner and SNARE Overview Integration with SMITE architecture Current integration status and plan

  3. BotMiner: Structure and Protocol Independent • Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …

  4. Definition of a Botnet • “A coordinated group of malwareinstances that are controlled by a botmaster via some C&C channel” • Hosts that have similar C&C-like traffic and similar malicious activities • We need to monitor two planes • C-plane (C&C communication plane): “who is talking to whom” • A-plane (malicious activity plane): “who is doing what”

  5. BotMiner Architecture Sensors Algorithms Correlation

  6. BotMiner C-plane Clustering • What characterizes a communication flow (C-flow) between a local host and a remote service? • <protocol, srcIP, dstIP, dstPort> • Temporal related statistical distribution information • E.g., BPS (bytes per second), FPH (flows per hour) • Spatial related statistical distribution information • E.g., BPP (bytes per packet), PPF (packets per flow)

  7. A-plane Clustering • Capture “similar activities patterns”

  8. Cross-plane Correlation • Botnet score s(h) for every host h • A host has higher score if it is in more activity clusters and in both activity and communication clusters • A host with a high score is a bot • Similarity score between bot host hi and hj • Two hosts in the same A-clusters and in at least one common C-cluster are clustered together • Each cluster is a bot

  9. SMITE Integration: BotMiner

  10. Integrating BotMiner and SMITE • Sensors • Feature extraction for C-Plane and A-Plane clustering • C-Flow temporal and statistical features • Counting packets and connections between each pair of endpoints: bytes per second, flows per hour, bytes per packet, packets per flow • A-Plane header and payload features • Destination IP addresses and ports, payload bytes/strings • These sensors are not specific to BotMiner

  11. Integrating BotMiner and SMITE • Algorithms • C-plane clustering • Multi-step clustering based on statistical and temporal C-flow features • A-plane clustering • Based on activity-specific similarity measures: e.g., spread of destination IP addresses and ports, Dice’s coefficient of string similarity, and byte frequency or entropy of payload • Bot scoring and botnet clustering methods • Scoring based on participation in C-plane and A-plane clusters • Clustering based on common memberships in the C-plane and A-plane clusters

  12. Integrating BotMiner and SMITE • Correlation • Botnet detection involves both vertical and horizontal analysis/clustering: • Vertical: what activities a host has been involved in • Bot detection • Horizontal: what other hosts have similar (vertical) behavior patterns • Botnet detection • Similar analysis can be applied to other alerts • Improve botnet detection • Understand malicious activities and plans of attacks • Measure the scale of attacks

  13. Network-Based Spam Detection • Filter email based on how it is sent, in addition to simply what is sent. • Network-level properties are less malleable • Hosting or upstream ISP (AS number) • Membership in a botnet (spammer, hosting infrastructure) • Network location of sender and receiver • Set of target recipients

  14. Finding the Right Features • Goal: Sender reputation from a single packet header? • Low overhead • Fast classification • In-network • Perhaps more evasion resistant • Key challenge • What features satisfy these properties and can distinguish spammers from legitimate senders?

  15. Network-Level Features • Single-Packet • AS of sender’s IP • Distance to k nearest senders • Status of email service ports • Geodesic distance • Time of day • Single-Message • Number of recipients • Length of message • Aggregate (Multiple Message/Recipient)

  16. Sender-Receiver Geodesic Distance 90% of legitimate messages travel 2,200 miles or less

  17. Density of Senders in IP Space For spammers, k nearest senders are much closer in IP space

  18. Local Time of Day at Sender Spammers “peak” at different local times of day

  19. Other Network-Level Features • Time-of-day at sender • Upstream AS of sender • Message size (and variance) • Number of recipients (and variance)

  20. Combining Features: RuleFit • Put features into the RuleFit classifier • 10-fold cross validation on one day of query logs from a large spam filtering appliance provider • Comparable performance to SpamHaus • Incorporating into the system can further reduce FPs • Using only network-level features • Completely automated

  21. Benefits of Whitelisting Whitelisting top 50 ASes:False positives reduced to 0.14%

  22. Integrating SNARE and SMITE Algorithms/Correlation Sensors

  23. Integration with SMITE • Sensors • Extract network features from traffic • IP addresses • Combine with auxiliary data (routing, time, etc.) • Algorithms • Clustering algorithm to identify behavioral fingerprints • Learning algorithm to classify based on multiple features • Correlation • Clusters formed by aggregating sending behavior observed across multiple sensors • Various features also require input from data collected across collections of IP addresses

  24. SMITE Integration Challenges • Sources of labeled data • SNARE requires clean sources of labeled data for training • Data collection • SNARE’s performance improves when behavior can be observed across multiple domains

  25. Overall SMITE Integration

  26. SMITE Integration: Current Work • Study pipeline architecture and code • Modify flow-analyzer to dump 5-tuple flow information

  27. SMITE Integration: Phase I • Modify flow-analyzer with SMITE team to generate 5-tuple flow information (mid-March) • Spam/scan detection, flow aggregation in BotMiner; Spam feature extraction in SNARE (end of March) • Clustering and correlation in BotMiner; Classifier in SNARE (end of April)

  28. SMITE Integration: Phase II • Evaluate performance of BotMiner and SNARE • How many hours to process one-day of traffic, or what is the “lag” time between event and detection? • Design real-time detection algorithms • A two-tier system: off-line module output lists of suspicious hosts, and real-time module inspects all packets of these hosts; or, off-line module output clusters • Design algorithms to handle asymmetric traffic • Cluster on each direction of traffic and cross-correlate

  29. Thank You!

More Related