1 / 6

Spam Email Detection

Spam Email Detection. Ethan Grefe December 13, 2013. Motivation. Spam email is constantly cluttering inboxes Commonly removed using rule based filters Spam often has very similar characteristics This allows them to be detected using machine learning Naïve Bayes Classifiers

ianna
Download Presentation

Spam Email Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spam Email Detection Ethan Grefe December 13, 2013

  2. Motivation • Spam email is constantly cluttering inboxes • Commonly removed using rule based filters • Spam often has very similar characteristics • This allows them to be detected using machine learning • Naïve Bayes Classifiers • Support Vector Machines

  3. SVM Solution • Used training data from CSDMC2010 SPAM corpus • 4327 labeled emails • 2949 non-spam messages (HAM) • 1378 spam messages (SPAM). • Extracted features from the subject and body of emails • Used resulting feature vectors to train an SVM classifier in Matlab

  4. Email Features • Features were determined by research and observation • Best results were obtained with the following features • Percentage of letters that arecapitalized • Types of punctuation used • Average length ofa word • Amount of html in the email

  5. Classifier Results • Trained on a random 35% of emails • Tested SVM classifier on remaining 65% • Trained SVM using three different kernel functions

  6. Possible Improvements • Use Naïve Bayes to classify emails using word frequency • Obtain a wider variety of input features • Test other types of learning algorithms

More Related