Network-Level Spam Defenses Nick FeamsterGeorgia Tech with Anirudh Ramachandran, Shuang Hao, Alex Gray, Santosh Vempala
Spam: More than Just a Nuisance • 95% of all email traffic • Image and PDF Spam (PDF spam ~12%) • As of August 2007, one in every 87 emails constituted a phishing attack • Targeted attacks on the rise • 20k-30k unique phishing attacks per month Source: CNET (January 2008), APWG
Approach: Filter • Prevent unwanted traffic from reaching a user’s inbox by distinguishing spam from ham • Question: What features best differentiate spam from legitimate mail? • Content-based filtering: What is in the mail? • IP address of sender: Who is the sender? • Behavioral features: How the mail is sent?
Conventional: Content Filters • Trying to hit a moving target... Images PDFs Excel sheets ...and even mp3s!
Problems with Content Filtering • Customized emails are easy to generate: Content-based filters need fuzzy hashes over content, etc. • Low cost to evasion:Spammers can easily alter features of an email’s content can be easily adjusted and changed • High cost to filter maintainers: Filters must be continually updated as content-changing techniques become more sophisticated
Another Approach: IP Addresses • Problem: IP addresses are ephemeral • Every day, 10% of senders are from previously unseen IP addresses • Possible causes • Dynamic addressing • New infections
Our Idea: Network-Based Filtering • Filter email based on how it is sent, in addition to simply whatis sent. • Network-level properties are less malleable • Network/geographic location of sender and receiver • Set of target recipients • Hosting or upstream ISP (AS number) • Membership in a botnet (spammer, hosting infrastructure)
Why Network-Level Features? • Lightweight: Don’t require inspecting details of packet streams • Can be done at high speeds • Can be done in the middle of the network • Robust:Perhaps more difficult to change some network-level features than message contents
Challenges (Talk Outline) • Understandingnetwork-level behavior • What network-level behaviors do spammers have? • How well do existing techniques work? • Building classifiers using network-level features • Key challenge: Which features to use? • Two Algorithms: SNARE and SpamTracker • Building the system • Dynamism: Behavior itself can change • Scale: Lots of email messages (and spam!) out there
Some Questions of Study • Where (in IP space, in geography) does spam originate from? • What OSes are used to send spam? • What techniques are used to send spam?
Data: Spam and BGP • Spam Traps: Domains that receive only spam • BGP Monitors: Watch network-level reachability Domain 1 Domain 2 17-Month Study: August 2004 to December 2005
Data Collection: MailAvenger • Configurable SMTP server • Collects many useful statistics
~ 10 minutes Finding: BGP “Spectrum Agility” • Hijack IP address space using BGP • Send spam • Withdraw IP address A small club of persistent players appears to be using this technique. Common short-lived prefixes and ASes 126.96.36.199/8 4678 188.8.131.52/8 21562 184.108.40.206/8 8717 Somewhere between 1-10% of all spam (some clearly intentional, others might be flapping)
Spectrum Agility: Big Prefixes? • Flexibility:Client IPs can be scattered throughout dark space within a large /8 • Same sender usually returns with different IP addresses • Visibility: Route typically won’t be filtered (nice and short)
Other Findings • Top senders: Korea, China, Japan • Still about 40% of spam coming from U.S. • More than half of sender IP addresses appear less than twice • ~90% of spam sent to traps from Windows
How Well do IP Blacklists Work? • Completeness: The fraction of spamming IP addresses that are listed in the blacklist • Responsiveness: The time for the blacklist to list the IP address after the first occurrence of spam
Completeness and Responsiveness • 10-35% of spam is unlisted at the time of receipt • 8.5-20% of these IP addresses remain unlisted even after one month Data: Trap data from March 2007, Spamhaus from March and April 2007
What’s Wrong with IP Blacklists? • Based on ephemeral identifier (IP address) • More than 10% of all spam comes from IP addresses not seen within the past two months • Dynamic renumbering of IP addresses • Stealing of IP addresses and IP address space • Compromised machines • IP addresses of senders have considerable churn • Often require a human to notice/validate the behavior • Spamming is compartmentalized by domain and not analyzed across domains
Are There Other Approaches? • Option 1: Stronger sender identity [AIP, Pedigree] • Stronger sender identity/authentication may make reputation systems more effective • May require changes to hosts, routers, etc. • Option 2: Behavior-based filtering [SNARE, SpamTracker] • Can be done on today’s network • Identifying features may be tricky, and some may require network-wide monitoring capabilities
Outline • Understanding the network-level behavior • What behaviors do spammers have? • How well do existing techniques work? • Classifiers using network-level features • Key challenge: Which features to use? • Two algorithms: SNARE and SpamTracker • The System: SpamSpotter • Dynamism: Behavior itself can change • Scale: Lots of email messages (and spam!) out there
Finding the Right Features • Goal: Sender reputation from a single packet? • Low overhead • Fast classification • In-network • Perhaps more evasion resistant • Key challenge • What features satisfy these properties and can distinguish spammers from legitimate senders?
Set of Network-Level Features • Single-Packet • AS of sender’s IP • Distance to k nearest senders • Status of email service ports • Geodesic distance • Time of day • Single-Message • Number of recipients • Length of message • Aggregate (Multiple Message/Recipient)
Sender-Receiver Geodesic Distance 90% of legitimate messages travel 2,200 miles or less
Density of Senders in IP Space For spammers, k nearest senders are much closer in IP space
Local Time of Day at Sender Spammers “peak” at different local times of day
Combining Features: RuleFit • Put features into the RuleFit classifier • 10-fold cross validation on one day of query logs from a large spam filtering appliance provider • Comparable performance to SpamHaus • Incorporating into the system can further reduce FPs • Using only network-level features • Completely automated
SNARE: Putting it Together • Email arrival • Whitelisting • Top 10 ASes responsible for 43% of misclassified IP addresses • Greylisting • Retraining
Benefits of Whitelisting Whitelisting top 50 ASes:False positives reduced to 0.14%
Outline • Understanding the network-level behavior • What behaviors do spammers have? • How well do existing techniques work? • Classifiers using network-level features • Key challenge: Which features to use? • Algorithms: SNARE andSpamTracker • Building the system (SpamSpotter) • Dynamism: Behavior itself can change • Scale: Lots of email messages (and spam!) out there
SpamTracker • Idea:Blacklist sending behavior (“Behavioral Blacklisting”) • Identify sending patterns commonly used by spammers • Intuition:Much more difficult for a spammer to change the technique by which mail is sent than it is to change the content
SpamTracker: Clustering Approach • Construct a behavioral fingerprint for each sender • Cluster senders with similar fingerprints • Filter new senders that map to existing clusters
DHCP Reassignment Infection SpamTracker: Identify Invariant IP Address: 24.99.146.xxx Unknown sender IP Address: 76.17.114.xxx Known Spammer spam spam spam spam spam spam domain3.com domain1.com domain2.com domain3.com domain1.com domain2.com Cluster on sending behavior Cluster on sending behavior Similar fingerprint! Behavioral fingerprint
Building the Classifier: Clustering • Feature: Distribution of email sending volumes across recipient domains • Clustering Approach • Build initial seed list of bad IP addresses • For each IP address, compute feature vector: volume per domain per time interval • Collapse into a single IP x domain matrix: • Compute clusters
Clustering: Output and Fingerprint • For each cluster, compute fingerprint vector: • New IPs will be compared to this “fingerprint” IP x IP Matrix: Intensity indicates pairwise similarity
Evaluation • Emulate the performance of a system that could observe sending patterns across many domains • Build clusters/train on given time interval • Evaluate classification • Relative to labeled logs • Relative to IP addresses that were eventually listed
Data • 30 days of Postfix logs from email hosting service • Time, remote IP, receiving domain, accept/reject • Allows us to observe sending behavior over a large number of domains • Problem: About 15% of accepted mail is also spam • Creates problems with validating SpamTracker • 30 days of SpamHaus database in the month following the Postfix logs • Allows us to determine whether SpamTracker detects some sending IPs earlier than SpamHaus
Clustering Results Ham Spam Separation may not be sufficient alone, but could be a useful feature SpamTracker Score
Outline • Understanding the network-level behavior • What behaviors do spammers have? • How well do existing techniques work? • Building classifiers using network-level features • Key challenge: Which features to use? • Algorithms: SpamTracker and SNARE • Building the system (SpamSpotter) • Dynamism: Behavior itself can change • Scale: Lots of email messages (and spam!) out there
Deployment: Real-Time Blacklist • As mail arrives, lookups received at BL • Queries provide proxy for sending behavior • Train based on received data • Return score Approach
Challenges • Scalability: How to collect and aggregate data, and form the signatures without imposing too much overhead? • Dynamism: When to retrain the classifier, given that sender behavior changes? • Reliability: How should the system be replicated to better defend against attack or failure? • Evasion resistance: Can the system still detect spammers when they are actively trying to evade?
Design Choice: Augment DNSBL • Expressive queries • SpamHaus: $ dig 220.127.116.11.zen.spamhaus.org • Ans: 127.0.0.3 (=> listed in exploits block list) • SpamSpotter: $ dig \ receiver_ip.receiver_domain.sender_ip.rbl.gtnoise.net • e.g., dig 18.104.22.168.gmail.com.-.22.214.171.124.rbl.gtnoise.net • Ans: 127.1.3.97 (SpamSpotter score = -3.97) • Also a source of data • Unsupervised algorithms work with unlabeled data
Latency Performance overhead is negligible.
Design Choice: Sampling Relatively small samples can achieve low false positive rates
Possible Improvements • Accuracy • Synthesizing multiple classifiers • Incorporating user feedback • Learning algorithms with bounded false positives • Performance • Caching/Sharing • Streaming • Security • Learning in adversarial environments
Summary: Network-Based Behavioral Reputation • Spam increasing, spammers becoming agile • Content filters are falling behind • IP-Based blacklists are evadable • Up to 30% of spam not listed in common blacklists at receipt. ~20% remains unlisted after a month • Complementary approach: behavioral blacklisting based on network-level features • Key idea: Blacklist based on how messages are sent • SNARE: Automated sender reputation • ~90% accuracy of existing with lightweight features • SpamTracker: Spectral clustering • catches significant amounts faster than existing blacklists • SpamSpotter: Putting it together in an RBL system
Next Steps: Phishing and Scams • Scammers host Web sites on dynamic scam hosting infrastructure • Use DNS to redirect users to different sites when the location of the sites move • State of the art: Blacklist URL • Our approach: Blacklist based on network-level fingerprints Konte et al., “Dynamics of Online Scam Hosting Infrastructure”, PAM 2009
References • Anirudh Ramachandran and Nick Feamster, “Understanding the Network-Level Behavior of Spammers”, ACM SIGCOMM, 2006 • Anirudh Ramachandran, Nick Feamster, and Santosh Vempala, “Filtering Spam with Behavioral Blacklisting”, ACM CCS, 2007 • Nadeem Syed, Shuang Hao, Nick Feamster, Alex Gray and Sven Krasser, “SNARE: Spatio-temporal Network-level Automatic Reputation Engine”, GT-CSE-08-02 • Anirudh Ramachandran, Shuang Hao, Hitesh Khandelwal, Nick Feamster, Santosh Vempala, “A Dynamic Reputation Service for Spotting Spammers”, GT-CS-08-09 (In submission)
Time Between Record Changes Fast-flux Domains tend to change much more frequently than legitimately hosted sites
Classifying IP Addresses • Given “new” IP address, build a feature vector based on its sending pattern across domains • Compute the similarity of this sending pattern to that of each known spam cluster • Normalized dot product of the two feature vectors • Spam score is maximum similarity to any cluster