learning to detect phishing emails n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Learning to Detect Phishing Emails PowerPoint Presentation
Download Presentation
Learning to Detect Phishing Emails

Loading in 2 Seconds...

play fullscreen
1 / 25
devika

Learning to Detect Phishing Emails - PowerPoint PPT Presentation

107 Views
Download Presentation
Learning to Detect Phishing Emails
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Learning to Detect Phishing Emails Report : 鄭志欣 Advisor:Hsing-Kuo Pao I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In Proceedings of the International World Wide Web Conference (WWW), pages 649–656, 2007.

  2. Outline • Introduction • Method • Empirical evaluation • Conclusion

  3. Introduction • Phishing (Spoofed websites) • Stealing account information • Logon credentials • Identity information • Phishing Problem – Hard

  4. Method • PILFER – A Machine Learning based approach to classification. • phishing emails / ham (good) emails • Feature Set • Features as used in email classification

  5. Features as used in email classification • IP-based URLs: • http://192.168.0.1/paypal.cgi?fix_account • Phishing attacks are hosted off of compromised PCs. • This feature is binary.

  6. Age of linked-to domain names • Legitimate-sounding domain name • Palypal.com • paypal-update.com • These domains often have a limited life • WHOIS query • date is within 60 days of the date the email was sent – “fresh” domain. • This is a binary feature

  7. Nonmatching URLs • This is a case of a link that says paypal.com but actually links to badsite.com. • Such a link looks like <a href="badsite.com"> paypal.com</a>. • This is a binary feature.

  8. “Here” links to non-modal domain • “Click here to restore your account access” • Link with the text “link”, “click”, or “here” that links to a domain other than this “modal domain” • This is a binary feature.

  9. HTML emails • Emails are sent as either plain text, HTML, or a combination of the two - multipart/alternative format. • To launch an attack without using HTML is difficult. • This is a binary feature.

  10. Number of links • The number of links present in an email. • <a> in HTML tag • This is a continuous feature.

  11. Number of domains • Simply take the domain names previously extracted from all of the links, and simply count the number of distinct domains. • Look at the “main” part of a domain • https://www.cs.university.edu/ • http://www.company.co.jp/ • This is a continuous feature.

  12. Number of dots • Subdomains like  • http://www.my-bank.update.data.com. • Redirection script, such as • http://www.google.com/url?q=http://www.badsite.com • This feature is simply the maximum number of dots (`.') contained in any of the links present in the email, and is a continuous feature.

  13. Contains javascript • Attackers can use JavaScript to hide information from the user, and potentially launch sophisticated attacks. • An email is flagged with the “contains javascript” feature if the string “javascript” appears in the email, regardless of whether it is actually in a <script> or <a> tag • This is a binary feature.

  14. Spam-filter output • This is a binary feature, using the trained version of SpamAssassin with the default rule weights and threshold. • “Ham” or “Spam” • This is a Binary feature.

  15. Empirical Evaluation • Machine-Learning Implementation • Testing Spam Assassin • Datasets • Additional Challenges • False Positives vs. False Negatives

  16. Machine-Learning Implementation-PILFER • First, run a set of scripts to extract all the features listed. • Second , we train and test a classifier using 10-fold cross validation. • Random Forest (classifier) • Random forests create a number of decision trees and each decision tree is made by randomly choosing an attribute to split on at each level, and then pruning the tree.

  17. we use a random forest as a classifier.

  18. Testing SpamAssassin • SpamAssassin is a widely-deployed freely-available spam filter that is highly accurate in classifying spam emails. • We classify the exact same dataset using SpamAssassin version 3.1.0, using the default thresholds and rules. • Using “Untrain” SpamAssassin • “Training” on 10-fold

  19. Datasets • Two publicly available datasets. • ham corpora from the SpamAssassin project • 6950 non-phishing non-spam emails • Phishingcorpus • approximately 860 email messages

  20. Additional Challenges • The age of the dataset. • Phishing websites are short-lived. • Some of our features can therefore not be extracted from older emails, making our tests difficult. EX: Domain linked to

  21. Result

  22. Conclusion • it is possible to detect phishing emails with high accuracy by using a specialized filter, using features that are more directly applicable to phishing emails than those employed by general purpose spam filters.

  23. Reference • I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In Proceedings of the International World Wide Web Conference (WWW), pages 649–656, 2007. • www.ics.uci.edu/.../Learning%20to%20Detect%20Phishing%20Emails.pptx • http://armorize-cht.blogspot.com/2010/01/phishing-mail.html