1 / 16

A LVQ-based neural network anti-spam email approach

A LVQ-based neural network anti-spam email approach. 楊婉秀 教授 資管碩一 詹元順 94722001 2005/12/07. Outline. 1 . Introduction 2. Email sample and data preprocessing 2.1 Email representation 2.2 Feature extraction 3. Anti-spam email LVQ model 3.1 Spam email category.

dick
Download Presentation

A LVQ-based neural network anti-spam email approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A LVQ-based neural network anti-spam email approach 楊婉秀 教授 資管碩一 詹元順 94722001 2005/12/07

  2. Outline • 1. Introduction • 2. Email sample and data preprocessing • 2.1 Email representation • 2.2 Feature extraction • 3. Anti-spam email LVQ model • 3.1 Spam email category. • 3.2 Learning vector quantization neural network model • 3.3 Anti-spam email LVQ algorithm • 3.4 Parameter setting • 4. Experiments and result • 5. Conclusion

  3. 1. Introduction(1/2) • Spam e-mail waste users time, money, network bandwidth as well as, meanwhile, clutter users' mailboxes, even be harmful, e.g. pornographic content. • In America, spam emails make enterprises to be loss up to 9 billions per year. • Without appropriate counter-measures, the situation will continue worsening and spam email will eventually undermine the usability of email.

  4. 1. Introduction(2/2) • Duhong Chen et al. compared four algorithms, Bayes, decision tree, neural networks, Boosting, and drew a conclusion that neural network algorithm has higher performance. • Experiments have proved that the LVQ-based anti-spare email filter has better performance than Bayes- based and BP neural network.-based approaches.

  5. 2. Email sample and data preprocessing(1/2) 2.1 Email representation • TFIDFi=TFi × log (N/DFi) (1) • TFi:the frequency that word ti appears in document d 2.2 Feature extraction • N:the total numbers of training documents • DFi:represents the numbers of documents which contain word ti

  6. 2. Email sample and data preprocessing(2/2) 2.2 Feature extraction • A:the numbers of emails which contain word t and belong to class s • B:that of emails which contain word but not belong to class s • C:that of emails which belong to class s but not contain word t • N:the total email number in training corpus

  7. 3. Anti-spam email LVQ model(1/5) 3.1 Spam email category.

  8. 3. Anti-spam email LVQ model(2/5) 3.2 Learning vector quantization neural network model • The model is divided into two layers. The first layer is competitive layer, in which each neuron represents a subclass. • The second is output layer, in which each neuron represents a class.

  9. 3. Anti-spam email LVQ model(3/5) 3.3 Anti-spam email LVQ algorithm(1/2)

  10. 3. Anti-spam email LVQ model(4/5) 3.3 Anti-spam email LVQ algorithm(2/2)

  11. 3. Anti-spam email LVQ model(5/5) 3.4 Parameter setting

  12. 4. Experiments and result(1/4) • This project makes use of email corpus from http://www.spamassassin.org/publiccorpus, which is open available source. • Select 1000 pieces e-mails randomly from the corpus, including 580 spam e-mails, 420 legitimate e-mails.

  13. 4. Experiments and result(2/4) • Anti-spare email filter performance is often measured in terms of spam precision (SP) and sparn recall (SR).

  14. 4. Experiments and result(3/4) • A criterion F1, which incorporates spam precision and spare recall.

  15. 4. Experiments and result(4/4)

  16. 5. Conclusion • Both neural network-based algorithms are usually better than that based on Bayes. • LVQ-based method classify spam emails into several subclasses in content so that the feature words of each subclass of spam email is more related and closer as well as characteristics of each subclass of spam emails are easier to identify.

More Related