1 / 24

PhishDef : URL Names Say It All

PhishDef : URL Names Say It All. Michalis Faloutsos U niversity of California, Riverside USA. Anh Le, Athina Markopoulou U niversity of California, Irvine USA. What is Phishing?. Social engineering and technical means to steal consumers’ personal identity, data, etc.

tania
Download Presentation

PhishDef : URL Names Say It All

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PhishDef: URL Names Say It All MichalisFaloutsos University of California, Riverside USA Anh Le, AthinaMarkopoulou University of California, Irvine USA

  2. What is Phishing? • Social engineering and technical means to steal consumers’ personal identity, data, etc. • Cause billions of dollars of loss annually Anh Le - UC Irvine - PhishDef

  3. Antiphishing.org Anh Le - UC Irvine - PhishDef

  4. Example of a Phishing Site Anh Le - UC Irvine - PhishDef

  5. Current Protection • Google Safe Browsing • Microsoft Smart Screen • Third-Party Anh Le - UC Irvine - PhishDef

  6. Current Protection Model Google Safe Browsing • Motivation: • Blacklist-based protection is reactive -- -- cannot protect against zero-day phishing Anh Le - UC Irvine - PhishDef

  7. Outline Phishing Background Motivation Our proposal New Protection Model Learning Algorithms Dataset Feature Selection Evaluation Results Concluding Remarks Anh Le - UC Irvine - PhishDef

  8. Our Proposed Protection Model • Main challenges: Accuracy and Classification Latency • Which classification algorithm works best? • Which set of features works best? Anh Le - UC Irvine - PhishDef

  9. Prior Work Whittaker et al. [NDSS ’10] Google Safe Browsing Ma et al. [SIGKDD ’09] Batch-based Classification Ma et al. [ICML ‘09] Batch-based vs. Online Learning Server-Side Classification Anh Le - UC Irvine - PhishDef

  10. Main Contributions New Protection Model: Client-side classification Propose using Adaptive Regularization of Weights (AROW) High accuracy Resilient to noise Set of Lexical Features Fast to extract at client side Obfuscation resistant Anh Le - UC Irvine - PhishDef

  11. Machine Learning Algorithms • Batch-based Support Vector Machine • Online Perceptron • Confident Weighted (CW) [Dredze et al., ICML 2008] • Adaptive Regularization of Weights (AROW)[Crammer et al., NIPS 2009] Anh Le - UC Irvine - PhishDef

  12. Online Classification • Maintaining a weight vector and use it for classification • Online Perceptron Client Side: Trained Beforehand Extract In Real Time Server Side: Anh Le - UC Irvine - PhishDef

  13. Online Classification • Confident Weighted (CW) • Adaptive Regularization of Weights (AROW) minimum change enough to correct last mistake minimum change increasing confidence penalty for mistake Anh Le - UC Irvine - PhishDef

  14. Dataset • Phishing URLs • PhishTank (4,082) • MalwarePatrol (2,001) • Benign URLs • Open directory(4,012) • Yahoo directory (4,143) • Time period: June 2010 Anh Le - UC Irvine - PhishDef

  15. Feature Selection • Lexical Features • External Features • Country, AS number, registration date, registrant, registrar, etc. Anh Le - UC Irvine - PhishDef

  16. Outline Phishing Background Motivation Our proposal New Protection Model Learning Algorithms Dataset Feature Selection Evaluation Results Concluding Remarks Anh Le - UC Irvine - PhishDef

  17. Evaluation Results: Lexical vs. Full Features • (+) ~ 1% • (-) Dependency on Remote Server • (-) Avg. Latency: 1.64 s Lexical features alone are better-suited than full features for client-side phishing classification Anh Le - UC Irvine - PhishDef

  18. Evaluation Results:CW vs. AROW AROW is more resilient to noise than CW Anh Le - UC Irvine - PhishDef

  19. Conclusion: PhishDef • Client-side phishing classification system • Proactive, on-the-fly classification of zero-day phishing URLs • Low delay client side (ms),high accuracy (97%) • Resilient to noisy data • Future Work: • Develop an add-on for Firefox Anh Le - UC Irvine - PhishDef

  20. Questions Anh Le - UC Irvine - PhishDef

  21. Anh Le - UC Irvine - PhishDef

  22. Example of a Phishing Site http://pilety.ru/c548c205d7660ed0628b467d7d5aa54c9c3a7124/image/taxrefund.htm http://www.hmrc.gov.uk/intro-income-tax.htm Anh Le - UC Irvine - PhishDef

  23. Evaluation Results:Batch-Based vs. Online Learning Online Learning outperforms Batched-Based Learningfor Phishing classification Anh Le - UC Irvine - PhishDef

  24. Chrome 11 > Firefox 4 Anh Le - UC Irvine - PhishDef

More Related