1 / 34

Spamming Botnets: Signatures and Characteristics

Spamming Botnets: Signatures and Characteristics. Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008 . Presented by: Arnold Perez. Outline. Introduction Goals AutoRE Challenges Design Results Botnet characteristics Contributions

elin
Download Presentation

Spamming Botnets: Signatures and Characteristics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008. Presented by: Arnold Perez

  2. Outline • Introduction • Goals • AutoRE • Challenges • Design • Results • Botnet characteristics • Contributions • Weaknesses

  3. Introduction • Botnets are commonly used for profit • Botnets rented out to spammers • Botnets can send spam emails at a large scale • Can transmit thousands of emails in a short duration • Difficult to detect and blacklist individual bots

  4. Goals • Understand the behaviors of botnets from the perspective of large email servers that are popular targets • Identify botnet characteristics and trends • Track sending behavior and content patterns • Develop a framework (AutoRE) that identifies botnet hosts by generating botnet spam signatures from emails

  5. AutoRE • Motivated by recent success of signature based worm and virus detection systems • Botnet spam emails are often sent in an aggregate fashion, resulting in content prevalence similar to worm propagation • Focus primarily on URLs embedded in the email

  6. AutoRE Challenges • Spammers often add random, legitimate URLs to content in order to increase the perceived legitimacy of emails

  7. AutoRE Challenges • Spammers use URL obfuscation techniques to evade detection

  8. AutoRE Design

  9. AutoRE Design • Input • Set of unlabeled email messages • Output • Set of spam URL signatures • Complete URL string • URL regular expression • List of botnet host IP addresses

  10. AutoRE Design • Comprised of three modules • URL preprocessor • Extracts URLs and other relevant fields and groups them according to web domain • Group selector • Selects URL groups with the highest degree of burstiness in sending times • RegEx generator • Extracts signatures by processing one group at a time

  11. URL Pre-Processing • Extracts • URL string • Source server IP address • Email sending time • Partitions into groups based on web domains • Emails from same spam campaign always advertise the same product or service from the same domain

  12. URL Group Selection • Each email my belong to more than one group • Use the bursty property of botnet email traffic • Select group that exhibits the strongest temporal correlation across a large set of distributed senders

  13. Signature Generation and Botnet Identification • Two types of signatures • Complete URL based signature • Regular expression signatures • Signature criteria • Distributed • Bursty • Specific

  14. Signature Generation and Botnet Identification • Distributed • Total number of Autonomous Systems (AS) spanned by source IP addresses must be at least 20 • Bursty • The set of matching URLs must be sent within 5 days • Specific • Complete URLs are specific by definition • For regex, entropy reduction is used to test. Probability of a random string matching signature is 1/(2^90)

  15. Automatic URL Regular Expression Generation

  16. Signature Tree Construction • Constructs a keyword-based signature tree where each node corresponds to a substring, with the root of the tree set to the domain name • Keywords are the most frequent substrings that are both bursty and distributed

  17. Signature Tree Construction

  18. Regular Expression Generation • Detailing • Returns a domain specific regular expression using the keyword-based signature • Generalization • Returns a more general domain-agnostic regular expression by merging very similar domain-specific expressions

  19. Regular Expression Generation

  20. Datasets and Results • Based on randomly sampled Hotmail email messages • November 2006 • June 2007 • July 2007 • Total of 5,382,460 sampled emails • Pre-classified as either spam or non-spam by human user (not used by filter, used for validation purposes only)

  21. AutoRE Results • Identified 7,721 botnet spam campaigns • 580,466 spam messages • 340,050 distinct botnet host IP addresses • 5,916 AS

  22. AutoRE Results

  23. AutoRE Results • Majority of the campaigns belong to CU category • 100% increase from July 2007 when compared to Nov 2006 • Spam volume increased 50% in same time period • Total number of botnet IPs does not increase proportionally, suggesting that each botnet is being used more aggressively

  24. False Positive Rate • Rate = non spam matching signature / total number of non spam

  25. Ability to Detect Future Spam • Experiment • Apply signatures derived in Nov 2006 and June 2007 to the emails collected in July 2007 • Nov 2006 signatures are not useful • Indicates that spam URL patterns evolve over time • June 2007 signatures are highly effective • RE signatures are more robust than CU signatures over time

  26. Regular Expressions vs Keyword Conjunctions • Identical spam detection rates • Difference is in false positive rate

  27. Domain-specific vs Domain-Agnostic Signatures • Generalization effectively preserves the stable structures of polymorphic URLs while removing the volatile domain substrings

  28. Botnet Characteristics • Distribution of IP addresses indicate botnet menace is a global phenomenon, with China, Korea, France, and USA having significant number of IP addresses

  29. Botnet Characteristics • When viewed individually, botnet hosts do not exhibit distinct sending patterns • Content in email is quite different even though the target web pages are the same • 50% of botnet spam campaigns have a standard deviation of less than 1.81 hours, while 90% have standard deviation of less than 24 hours.

  30. Botnet Characteristics • Similar number of recipients per email • Share a constant connection rate • Most likely due to rate control seen in botnet software • Large number of campaigns share the same domain-agnostic regular expression signatures • Same botnets participating in multiple spam campaigns

  31. Contributions • AutoRE, a framework that automatically generates URL signatures for spamming botnet detection • Several important findings about botnet spam • Botnet hosts spread across the internet • No distinctive pattern when viewed individually • Botnet host sending patterns

  32. Weaknesses • The AutoRE system analyzes batches of emails after they are all received • Would be better if we could do this in real time to stop email once a campaign has been identified and a signature created • The AutoRE system needs a lot of emails to work effectively. • We can’t use it on individual inboxes, it must be put between the ISP and the incoming email

  33. Weaknesses • I was hoping to take the characteristics found in the paper to use in my own project • Paper shows that individually you can not identify spam from botnets. The AutoRE system works on group behavior.

  34. References • "Spamming Botnets: Signatures and Characteristics". Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008.

More Related