Spam not any more detecting spam emails using neural networks
Download
1 / 8

- PowerPoint PPT Presentation


  • 249 Views
  • Uploaded on

Spam? Not any more !! Detecting spam emails using neural networks. ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan. Importance of the topic. Spam is unsolicited and unwanted emails Wastage of bandwidth, storage space and most of all, recipient’s time.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - isolde


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Spam not any more detecting spam emails using neural networks

Spam? Not any more !!Detecting spam emails using neural networks

ECE/CS/ME 539

Project presentation

Submitted by

Sivanadyan, Thiagarajan


Importance of the topic
Importance of the topic

  • Spam is unsolicited and unwanted emails

  • Wastage of bandwidth, storage space and most of all, recipient’s time

Goals of the Anti-spam Network

  • Reliably block spam mails

  • Should not block any non-spam mails, but can allow few spam mails to slip through

  • Adapt to the specific types of messages


Input features data set
Input Features – Data Set

  • Original data set: 57 input attributes

  • Output attribute: 1 (for spam)

    0 (for nonspam)

  • Inputs derived from email content

  • Attributes indicate the frequency of specific words and characters

  • Examples: ‘credit’, ‘free’ (in spam) ‘meeting’, ’project’, (in nonspam)


Preprocess the data
Preprocess the data

  • Choose only the inputs which differ for spam and non-spam mails

  • Two reduced data sets are obtained (21 Inputs and 9 Inputs)

  • The data is made zero mean, unit variance (4025 Input Vectors)

  • Split the data into two independent training and testing data sets


Mlp implementation
MLP Implementation

  • Learning by back propagation algorithm

  • Using complete data set

    • Poor performance (Classification rate: 63.2%)

    • Classified most of the mails as non-spam

  • Using reduced data set (Inputs – 21)

    • Good performance (Classification rate: 93.8%)

    • All the non-spam is detected

    • Optimal MLP Configuration: 20-10-10-10-7


Cross validation
Cross Validation

  • Using reduced data set (Inputs – 9)

    • Good performance (Classification rate: 92.1%)

    • Nearly all the non-spam is detected

    • Optimal MLP Configuration: 20-10-10-8

  • Using Cross - Validation

    • Negligible improvement in performance

    • Since all the data is derived from the same source, cross validation offers no advantage


Inference of the results
Inference of the results

  • Larger number of inputs does not necessarily improve the performance

  • It is important to remove redundant and irrelevant features

  • There is no optimum MLP configuration for all inputs – need to adapt depending on the email content

  • A combination of other types of spam filters along with neural networks can be used


Conclusion
Conclusion

  • Neural networks are a viable option in spam filtering

  • A number of heuristic methods are being increasingly applied in this field

  • Need to exploit the differences between spam and ‘good’ emails

  • Further opportunities

    • Data sets from different sources need to be used for training

    • Fuzzy logic and combinational algorithms can be used in this application


ad