1 / 32

Email Spam Filtering Computer Security Seminar

Email Spam Filtering Computer Security Seminar. N.Muthiyalu Jothir – 271120 Media Informatics. Agenda. What is Spam ? Statistics Who Benefits from it? Spam Filtering Techniques Combining Filters Conclusion. What is Spam?. Spam  Unsolicited email

merlin
Download Presentation

Email Spam Filtering Computer Security Seminar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Email Spam FilteringComputer Security Seminar N.Muthiyalu Jothir – 271120 Media Informatics Email Spam Filtering - Muthiyalu Jothir

  2. Agenda • What is Spam ? • Statistics • Who Benefits from it? • Spam Filtering Techniques • Combining Filters • Conclusion Email Spam Filtering - Muthiyalu Jothir

  3. What is Spam? • Spam  Unsolicited email • Emails that involves sending identical or nearly identical messages to thousands (or millions) of recipients. • Caution ! “SPAM - Spiced Ham ” is a popular American canned meat brand… Email Spam Filtering - Muthiyalu Jothir

  4. Problem  • With a tiny investment, a spammer can send over 100,000 bulk emails per hour. • Junk mails waste storage and transmission bandwidth. • ISP’s investment  Cost we absorb as ISP’s customer • Spam is a problem because the cost is forced onto us, the recipient. Email Spam Filtering - Muthiyalu Jothir

  5. Statistics Email Spam Filtering - Muthiyalu Jothir

  6. Who benefits from Spam? Financial Firms e.g. Mortgage Information about interested customers Recipient replies here Lead Generators (Gain 2% of Loan value per customer data) Recipient Spammers (Share the profit with Lead Generators) Email Spam Filtering - Muthiyalu Jothir

  7. Spam Control Techniques Fight Back techniques Filtering Techniques • Reporting Spam to ISP • Fight back filters • Slow Senders • Law ??? • etc. • Challenge-Response Filtering • Blacklists and White lists • Content based filters • Rule based • Bayesian filters Email Spam Filtering - Muthiyalu Jothir

  8. Reporting Spam To ISPs • Original spam solution • Legitimate ISPs respond to such complaints • Spammers kicked off Disadvantage • Disguised Spammers. • Naïve users cannot interpret the email headers Email Spam Filtering - Muthiyalu Jothir

  9. Filters that Fight Back (FFB) • Majority of spam contain links to web pages. • Spam filters could auto retrieve the URLs and crawl back to those pages, which would increase the load on the server. • If all the spam receivers do this at the same time, the server might be crashed and so the cost of spamming increases. Caution ! • FFB usually works with blacklists (of malicious servers) in order to avoid the attack on innocent servers. Email Spam Filtering - Muthiyalu Jothir

  10. Filtering Techniques Email Spam Filtering - Muthiyalu Jothir

  11. Spam Vs Ham • Care to be taken in any Spam filtering technique • “All the Spam could be allowed to pass thro; but, not even a single legitimate mail should be filtered.” • False Positive – Legitimate mail classified as spam. • Least false positive rate desired… • Caution : Check your junk folder before deleting • Don’tbelieve your Spam filter  Email Spam Filtering - Muthiyalu Jothir

  12. Challenge-Response Filtering • Emails from unknown senders will receive an auto-reply message asking them to verify themselves • Senders “Challenged" to type in a word that is hidden within a graphic or a sound file • Mail is forwarded to receiver’s inbox, only after successful “response” • This technique almost filters all spam . No spammer would be interested to take the extra effort to prove him / her self. • Commercial product “spamarrest” Disadvantage • This technique is rude  • Sometimes senders don’t or forget to reply to the challenge Email Spam Filtering - Muthiyalu Jothir

  13. Blacklists and White lists • Blacklists of misbehaving servers or known spammers that are collected by several sites. • Sender id in the email is compared with the blacklist • White lists are complementary to black lists, and contain addresses of trusted contacts • Use blacklists and white lists for the first level filtering (before applying content checks) and not used as the only tool for making decision. Disadvantage • Prone to wrong configurations with legitimate servers unable to exit from a list where they had been incorrectly inserted. Email Spam Filtering - Muthiyalu Jothir

  14. Content based filters • Not a good idea to filter mails just based on blacklists • Wiser decision Consider the actual content of the email • Almost all the successful spam filters use this technique • Major types : Rule-based and Bayesian Email Spam Filtering - Muthiyalu Jothir

  15. Rule Based Filters • Rule based filters work based on some static rules to decide whether a mail is a spam or not. • Rules could be • words and phrases • lots of uppercase characters • exclamation points • special characters • Web links • HTML messages • background colors • crazy Subject lines etc. Email Spam Filtering - Muthiyalu Jothir

  16. Rule based filters • Rules are given scores, based on importance • Incoming mails are parsed and checked for known malicious patterns • Total score calculated for the triggered rules • If Final Score > Threshold, classify as spam. Otherwise, classify as legitimate mail. • Threshold decided by the user. Email Spam Filtering - Muthiyalu Jothir

  17. Rule Based Filters • “Spamassasin”, a popular spam filtering product uses rule based filtering. • Perl Regex (Regular expressions) used for pattern checking • Example rules • header __LOCAL_FROM_NEWS From /news@example\.com/i • body __LOCAL_SALES_FIGURES /\bMonthly Sales Figures\b/ • score LOCAL_NEWS_SALES_FIGURES 0.8 Email Spam Filtering - Muthiyalu Jothir

  18. Rule Based Filters • Advantage • Easy to implement • No training required • Disadvantage • Static rules too general • Spammers find new ways to deceive the rules Email Spam Filtering - Muthiyalu Jothir

  19. Bayesian Filters • Bayesian filters are the latest in spam filtering technology and the most successful. • Bayes classifiers were used extensively in the field of pattern recognition. • Given an unlabeled example, the classifier will calculate the most likely classification with some degree of probability. Email Spam Filtering - Muthiyalu Jothir

  20. Bayesian Filters • Steps in Bayes Filtering • Training • Validation • Implementation • Training starts with two collections of mails : one of spam and one of legitimate mail. • For every word in these emails, it calculates a spam probability based on the proportion of spam occurrences. • Bayesian filters are quite accurate, and adapt automatically as spam evolves. • False positives are minimized by Bayesian filtering because they consider evidence of innocence as well as evidence of spam. Email Spam Filtering - Muthiyalu Jothir

  21. Bayesian Filtering • Bayes Probability, Pr (spam | words) = Pr (spam) * Pr (words | Spam) Pr (words) • Probabilitycloser to 1 would be classified as spam and closer to 0 is classified as ham. • 0.5 is set as the threshold. Email Spam Filtering - Muthiyalu Jothir

  22. Neural Network for Training • Neural Network Structure i Email Spam Filtering - Muthiyalu Jothir

  23. Neural Networks for Training • Neural networks are used to train the spam filter (Rule-based or Bayesian) and itself is not a filter • Input words or rules etc. • Trained over multiple samples of the user’s mails (both spam and ham) • Weights of the links are altered till the desired output is obtained. Email Spam Filtering - Muthiyalu Jothir

  24. Supervised Learning • Supervised learning  Training with a “teacher” signal • Train the system till we get optimized unaltered weights for the edges. Caution! • Take care not to over train the network. Email Spam Filtering - Muthiyalu Jothir

  25. Combining Spam Filters • Goal Combined filter aims to improve individual filters performance. • Combined Filter = Original Filter (OF) + Received Filter (RF) • Max gain Received filter contains some feature sets not found in the original filter. • E.g. Original Filter = {“Share Market”, “Higher Studies”} Received filter = {“Share Market”, “Job Alerts”} Email Spam Filtering - Muthiyalu Jothir

  26. Challenges • Decisions (Spam / Ham) made by both filters individually • Decisions agree  No Problem  • Disagreement  Due to difference of feature sets • Challenges • “How do we select the correct decision or filter?” • “Who selects it?” Email Spam Filtering - Muthiyalu Jothir

  27. Filter Selector (FS) • Training Phase  FS predicts the unique features (e.g. words) of RF • Parse the emails of training set and extract the features • ‘Bag’ of (predicted) features for RF • Text similarity comparison between the current e-mail's features and the feature sets of the filters. Email Spam Filtering - Muthiyalu Jothir

  28. Algorithm Flowchart Training Phase Final Verdict Email Spam Filtering - Muthiyalu Jothir

  29. TF – IDF Similarity Measure • Commonly used in Information Retrieval applications. • More frequent words would be key to accurate classification of emails • FS predicted feature set is unique • “Query – Document” retrieval procedure. • 2 documents – Feature sets • Query – Current email Email Spam Filtering - Muthiyalu Jothir

  30. Experiments & Results Email Spam Filtering - Muthiyalu Jothir

  31. Conclusion • We discussed the techniques to “kill” spam • Comparison between various techniques • So far, Bayesian seems to be reliable • Discussed a new approach to combine filters • Futurework : • Learning techniques for Filter Selector • Better Similarity measures Email Spam Filtering - Muthiyalu Jothir

  32. Thank You  Email Spam Filtering - Muthiyalu Jothir

More Related