how to beat an adaptive spam filter l.
Skip this Video
Loading SlideShow in 5 Seconds..
How to beat an Adaptive Spam Filter PowerPoint Presentation
Download Presentation
How to beat an Adaptive Spam Filter

Loading in 2 Seconds...

play fullscreen
1 / 24

How to beat an Adaptive Spam Filter - PowerPoint PPT Presentation

  • Uploaded on

How to beat an Adaptive Spam Filter John Graham-Cumming Creator and Maintainer of POPFile Research Director, Sophos’s Anti-Spam Task Force Token Space neither “Red Coat” Spams Obfuscated spam is trivial to spot and filter No need to even read the text, the obfuscations are enough

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'How to beat an Adaptive Spam Filter' - erika

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
how to beat an adaptive spam filter
How to beat an Adaptive Spam Filter

John Graham-Cumming

Creator and Maintainer of POPFile

Research Director, Sophos’s Anti-Spam Task Force

red coat spams
“Red Coat” Spams
  • Obfuscated spam is trivial to spot and filter
  • No need to even read the text, the obfuscations are enough
  • No real email contains the word Viagra written

&#86;<font size=0>&nbsp;</font>&#105;<font size=0>&nbsp;</font>&#97;<font size=0>&nbsp;</font>&#103;<font size=0>&nbsp;</font>&#114;<font size=0>&nbsp;</font>&#97;

  • “Field Guide to Spam” highlights spammer obfuscations: Invisible Ink, Camouflage, Hypertextus Interruptus...
popfile s working great for me but not 100
POPFile's working great for me... but not 100%
  • November 3, 2003 through December 22, 2003
    • Total mails received: 52,931
    • Total spams: 35,928 (68%!)
    • Total spams missed: 125
    • So POPFile ~99.7% accurate
  • 1 in 254 spams gets through... why?
taxonomy of filter busting spams
Taxonomy of filter busting spams
  • 52%: “picospams”
  • 13%: RTF
  • 9%: Challenge/Response
  • 9%: NDR
  • 4%: Totally blank
  • 13%: Other
    • Multiple copies of an offer for an “Incredible Spam Filter”
    • A message in Hebrew
  • Microsoft email clients sniff Rich Text Format
    • (actually they sniff a lot of different formats)

Content-Type: text/plain


\deff0\deflang1046\deflangfe1046{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;} {\f16\froman\fcharset129\fprq2{\*\panose 02030600000101010101}Batang{\*\falt??};}{\f28\froman\fcharset0\fprq2{\*\panose 02040602050305030304}Book Antiqua;}{\f29\froman\fcharset129\fprq2{\*\panose 00000000000000000000}@Batang;}{\f40\froman\fcharset238\fprq2 Times New Roman CE;}

challenge response
  • Received a number of “fake challenges”
  • Challenges directed me to a spammer's web site
  • This is how spammers can kill C/R
  • Personal note: I don't “do” C/R. If I mail you and you challenge me I hit delete, because, as Dan Quinlan put it: “C/R is the ultimate email diss. By using it you are saying, 'my time is more important than yours.'”
non deliverable response
Non-deliverable Response
  • As well as faking C/R messages, spammers fake NDRs
  • The NDR has the “original email” (actually a spam) as an attachment
  • Spammers can even get NDRs generated for them by badly configured mail servers
    • Send spam to known wrong address on a mail server with a forged from address
    • Mail server sends NDR to the forged from attaching the spam
  • Spam containing either:
    • As few tokens as possiblerobin:
    • Only HTML tokens<a href=><img src=></a>
  • Picospams got through because
    • Hadn't been seen before
    • Contained “good” headers
    • Had “word salad”

Thanks to Robin Keir for the tiny robin: mail

good headers
“Good Headers”
  • The combination of two things leads to the ham tokens outweighing the spam
    • picospam text
    • Relaying the message through a good server
  • Suitable good servers are:
    • Mail relays like,
    • Mailing lists
word salad
“Word Salad”
  • Spam stuffed with randomly selected words:<a href=""><img border="0" src=""></a>deliverance banister haploid sin beachcomb case stub doublet bread confucius buckaroo questionnaire tech issuance diagnose anglican finance pirouette u.s.a agree faculty nomenclature sheik insinuate pack dutchmen inhibition dubious patriotic aluminate
  • Sometimes words are hidden using Invisible Ink, Camouflage, MIME is Money or other tricks

The term “word salad” was coined by Cindy Harris in a POPFile forum.

word salad experiment
“Word Salad” Experiment
  • Took a real picospam (HTML style) that had previously been caught by POPFileSubject: cialis is now ready

<DIV align=center><FONT face="arial black" size=2>Save over 70% on</FONT></DIV><CENTER><FONT face="arial black" size=2>USA approved meds</B></FONT><BR></CENTER><center><a href="">Come visit us</a>

  • Added 100s of words from /usr/share/dict/words
  • Scored for spam vs. ham against my POPFile installation
word salad results
“Word Salad” Results

Number of spams (per 10,000) that got through

Number of words added

word salad ineffective
“Word Salad” Ineffective
  • Best result was 0.04% get through if
    • Send each person 10,000 copies of each spam
    • AND each spam is 3x bigger than before
  • Ineffective because
    • Randomly chosen words are likely to be:
      • First in neither,
      • then in spammy,
      • finally in hammy!

Because spammers send

so much spam!

word salad15
Word Salad



word salad variants
Word Salad Variants
  • Got similar results using words pulled from
    • News stories via
    • Articles from
  • Back to basics…
    • A filter busting spam needs to:
      • HAVE FEW tokens that look like spam
      • HAVE MORE tokens that look like my ham
    • How do you find my hammy tokens?
bayes vs bayes
Bayes vs. Bayes
  • If adaptive filters are so smart, perhaps they can beat adaptive filters?
  • Experiment:
    • Take a trained spam filter (“Good” POPFile)
    • And an untrained spam filter (“Evil” POPFile)
    • Take a spam that got through “Good”
    • Send copies of the spam with 5 random words appended
    • Train “Evil” depending on if it gets through “Good” or not
b vs b
B vs B




how to get feedback
How to get feedback
  • When sending each message include a unique web bug
  • Creates an effective feedback loop
  • Spammer can use web bug to train their POPFile installation
  • Bad news... this works:
    • Tested against my POPFile installation
    • Sent 10,000 emails containing 5 randomwords from /usr/share/dict/words
    • Found my kryptonite
kryptonite words
Kryptonite Words
  • accommodations, arrangements, berkshire, category, channel, checking, comment, currency, endless, entitled, flying, hills, independent, invoice, logging, marriott, occupancy, officer, operated, quantity, redeeming, rent, shared, silicon, touch, wireless
  • Adding just one of these words turns the spam into a ham!
is b vs b practical
Is B vs B practical?
  • Took 10,000 messages to one email address to train evil POPFile
  • But what about 10 messages to 1,000 mail addresses?
    • Say send 10 copies of a spam to everyone at might find specific kryptonite
defense against the dark arts
Defense against the dark arts
  • Absolutely NO feedback to spammers
    • No rendering HTML
    • No bouncing
    • No SMTP server errors
    • No selective challenge/response
    • No NDRs
  • Mailing List/Mail Forwards
    • Do spam filtering on in bound messages
  • Integrate header analysis with adaptive filtering
  • Current spam is “easy” for adaptive filters to detect
  • As spammers react to adaptive filtering spam will get harder to recognize
  • Feedback mechanisms present a risk to the effectiveness of adaptive filtering
  • Adaptive filters will need merging with “traditional” anti-spam techniques like DNSBL
thank you

Thank you.

All questions will now be answered via telepathy :-)