Spam not any more detecting spam emails using neural networks
Download
1 / 8

Spam Not any more Detecting spam emails using neural networks - PowerPoint PPT Presentation


  • 245 Views
  • Uploaded on

Spam? Not any more !! Detecting spam emails using neural networks. ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan. Importance of the topic. Spam is unsolicited and unwanted emails Wastage of bandwidth, storage space and most of all, recipient’s time.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Spam Not any more Detecting spam emails using neural networks' - isolde


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Spam not any more detecting spam emails using neural networks

Spam? Not any more !!Detecting spam emails using neural networks

ECE/CS/ME 539

Project presentation

Submitted by

Sivanadyan, Thiagarajan


Importance of the topic
Importance of the topic

  • Spam is unsolicited and unwanted emails

  • Wastage of bandwidth, storage space and most of all, recipient’s time

Goals of the Anti-spam Network

  • Reliably block spam mails

  • Should not block any non-spam mails, but can allow few spam mails to slip through

  • Adapt to the specific types of messages


Input features data set
Input Features – Data Set

  • Original data set: 57 input attributes

  • Output attribute: 1 (for spam)

    0 (for nonspam)

  • Inputs derived from email content

  • Attributes indicate the frequency of specific words and characters

  • Examples: ‘credit’, ‘free’ (in spam) ‘meeting’, ’project’, (in nonspam)


Preprocess the data
Preprocess the data

  • Choose only the inputs which differ for spam and non-spam mails

  • Two reduced data sets are obtained (21 Inputs and 9 Inputs)

  • The data is made zero mean, unit variance (4025 Input Vectors)

  • Split the data into two independent training and testing data sets


Mlp implementation
MLP Implementation

  • Learning by back propagation algorithm

  • Using complete data set

    • Poor performance (Classification rate: 63.2%)

    • Classified most of the mails as non-spam

  • Using reduced data set (Inputs – 21)

    • Good performance (Classification rate: 93.8%)

    • All the non-spam is detected

    • Optimal MLP Configuration: 20-10-10-10-7


Cross validation
Cross Validation

  • Using reduced data set (Inputs – 9)

    • Good performance (Classification rate: 92.1%)

    • Nearly all the non-spam is detected

    • Optimal MLP Configuration: 20-10-10-8

  • Using Cross - Validation

    • Negligible improvement in performance

    • Since all the data is derived from the same source, cross validation offers no advantage


Inference of the results
Inference of the results

  • Larger number of inputs does not necessarily improve the performance

  • It is important to remove redundant and irrelevant features

  • There is no optimum MLP configuration for all inputs – need to adapt depending on the email content

  • A combination of other types of spam filters along with neural networks can be used


Conclusion
Conclusion

  • Neural networks are a viable option in spam filtering

  • A number of heuristic methods are being increasingly applied in this field

  • Need to exploit the differences between spam and ‘good’ emails

  • Further opportunities

    • Data sets from different sources need to be used for training

    • Fuzzy logic and combinational algorithms can be used in this application


ad