1 / 29

BOTNET JUDO Fighting Spam with Itself

BOTNET JUDO Fighting Spam with Itself. By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage Presentation by: Heath Carroll. The Origins of Spam. Presentation Overview. Abstract - What was the intent of the paper?

keran
Download Presentation

BOTNET JUDO Fighting Spam with Itself

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BOTNET JUDOFighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage Presentation by: Heath Carroll

  2. The Origins of Spam

  3. Presentation Overview • Abstract - What was the intent of the paper? • Introduction - current problems faced and methods used to combat them • Background - Def: Botnet, Regular Expression, Template-based Spam • Approach - How the authors dealt with this problem

  4. Abstract • Botnet Judo: Fighting Spam with Itself or ‘Botnet Host Quarantine: What’d we learn?’ • Examination of a controlled, isolated, Botnet host. • Quick generation of precise and accurate spam filters with ~ 0 false positives

  5. Introduction : Botnets • Definition: Botnet - a collection of software agents, or robots, that run autonomously and automatically. The term is most commonly associated with malicious software, but it can also refer to a network of computers using distributed computing software. (en.wikipedia.org/wiki/Botnet) • Example: DDoS attack against Blue Security, May 2, 2006

  6. Botnets (cont’d) • Common uses of botnets: • Denial-of-service attacks • Adware • Spyware • Email spam (template, image, etc) • Click fraud • Internet Access number replacement • Fast flux (DNS Url/IP address switching)

  7. SPAM!! • Template Based Spam • Botnet uses a RE to produce massive amounts of highly varied spam • Harder to [content] filter initially due to varied message makeup • Requires defenders to collect ‘suspect’ spam in order to lobby an effective content-based filter • Harder to [sender] filter due to massive host lists • Requires defenders to rely on alternative methods to combat the botnet

  8. SPAM!! • Preventative measures: • Anti-virus software • Passive OS fingerprinting • Network based approaches (nullrouting) • Spam filtering • Directed study • The last two are covered by this paper

  9. Anti-spam!! • Basically 2 different approaches: • Content-based : • Filtering based on established heuristics and learning algorithms focused against specific message features • Can be highly effective (esp against targeted botnets) • Labor intensive to maintain since the basic technique can be countered by chaff and poisoning attacks • Hard to maintain low false positives from the filter • Blacklisting URLs can also be effective, but needs large up-to-date white-lists to avoid poisoning • Doesn’t do anything if spam doesn’t utilize URLs

  10. Anti-Spam!! (cont’d) • Sender-based • Focuses on spam delivery system • Assumes sender of spam is likely to repeat sending spam, and not likely to send legitimate messages • Basically works by Blacklisting offending senders after the fact • Doesn’t work against newest spam • Botnets are an effective work-around since the controller distributes his spam over a large number of hosts

  11. Anti-Spam!! (cont’d) • Template-based spam filtering: • Suspected Botnet generated spam is examined and deconstructed into a Regular Expression (RE) • Works very well against static botnets, but requires a lot of instances of suspected spam to deconstruct • Useless if controller changes the RE used by the bots

  12. Regular Expressions

  13. Regular Expressions (cont’d) • Review:

  14. JUDO!! • Generates regular expression signatures to thwart spam • Operates by examining the output from quarantined botnet • Uses template inference algorithm to generate a set of signatures matching all previous messages

  15. JUDO!! (cont’d) • Header Filtering • Anchor identification • Macro classification • Dictionary • Micro-anchor • Noise • Special Tokens • Signature Update Second Chance Pre-clustering

  16. Judo - Second Chance Mechanism • Used to mitigate the effects of a small training buffer • If a message signature fails to match an existing signature • It is re-checked using only anchors • If matched, signature is updated

  17. Judo - Pre-clustering • Used to mitigate the effects of overly large training buffers (potentially mixed RE’s) • Skeleton signatures used to sort incoming messages prior to running Judo on them • Similar to second chance mechanism, but with a larger allowable anchor size

  18. Experimental Results • Requirements of a good spam filter: • Safe: does not classify legitimate mail as spam • Low false positive rate • Effective: correctly identifies the targeted class of spam • Low false negative rate

  19. Experimental Results (cont’d) • Testing: 4 tiers • Signature safety • Signatures from 3 other tiers run against legitimate mail ‘corpora’ to access false positive rate • to prevent age bias, they tested the signatures only on the subject and body of the corpora

  20. Experimental Results (cont’d) • Controlled single template inference • Generated 5000 instances of spam from a ‘Storm’ bot from templates gained through reverse engineering • 1000 for signature generation • 4000 for testing false negative rate • Done for each of 10,676 templates (53,380,000 messages) • Results: • Also, at k = 1000 false positive rate = 0% for all sigs

  21. Experimental Results (cont’d) • Controlled multi-template inference • Spam used for testing generated during the Botlab project at the University of Washington • 4 bots used: 1 each from Mega-D, Pushido, Rustock, and Srizbi botnets • First million messages from each split into training and testing sets, then Judo run chronologically on each test message • True matches determined if a match generated from signature generated from previous test messages • Otherwise counted as false negative

  22. Experimental Results (cont’d) • Results: • Only false positives from Rustock bot tests

  23. Experimental Results (cont’d) • Real world deployment: • 2xXarvester + 2xMega-D + 4xRustock + 6xGheg = 14 bots • Messages generated: • Ran the test as in multi-template runs

  24. Experimental Results (cont’d) • Results: • Worst Case: Rustock again only source of false positives: 1 in 12,500 messages. All others 0 total false positives in corpora

  25. Experimental Results (cont’d) • Efficiency: Since the goal of the project was an accurate RE generator, efficiency wasn’t a priority • Initial RE generation using buffer size 50 with 6000 character length messages takes about 2 sec using an average desktop circa 2009 • Signature updates at ~ 50-100 ms

  26. Response Time • Based on the message out rate of the bot(s) generating the spam • May be complicated by the existance of multiple bots or templates • Bots used in this experiment generated > 100 spam messages per minute. • Since acceptable results from k >= 500, should only take a few minutes to generate a working signature

  27. Overview • ‘Judo’ is basically a learning spam filter • Content based • Requires training to produce effective signatures • Safe and Effective (both greater than 99.75%) • Controlled tests show exceptional results • Simulated real world tests show promise, but could be worked around by bots that can randomly generate new templates

  28. Any Questions?

More Related