1 / 23

CANTINA : A Content-Based Approach to Detecting Phishing Web Sites

CANTINA : A Content-Based Approach to Detecting Phishing Web Sites. Yue Zhang , Jason Hong, and Lorrie Cranor. WWW 2007. 2008.09.09. Agenda. Phishing Attacks Motivation & Goal Relative Work CANTINA Evaluation Conclusion. Phishing Attacks(1/2).

brand
Download Presentation

CANTINA : A Content-Based Approach to Detecting Phishing Web Sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CANTINA : A Content-Based Approach to Detecting Phishing Web Sites Yue Zhang , Jason Hong, and Lorrie Cranor WWW 2007 2008.09.09

  2. Agenda Phishing Attacks Motivation & Goal Relative Work CANTINA Evaluation Conclusion

  3. Phishing Attacks(1/2) • The Act of stealing personal information via the internet for the purpose of committing financial fraud • Create a faked site similar to original sites like bank • Send to users using variable methods • Spam e-mail, XSS vulnerabilities, Malware … • Technical issues • URL Obfuscation • Similar domain, Encoding URL… • DNS hijacking • Modifying hosts file, DNS server setting… • Malware • BHO(Browser Helper Object), Browser Toolbar, Key logger…

  4. Phishing Attacks(2/2) • Criminals often create phishing sites by copying and then modifying a legitimate site’s web pages • Similar to original web site • Often contain brand names and other terms that are common on a given web page • Owner’s brands

  5. Motivation & Goal • Phishing is a rapidly growing problem with 9,255 unique phishing sites reported in 2006 • 84 Anti-phishing toolbars • Low accuracies • There is a strong need for better automated detection algorithms • A novel content-based approach for detecting phishing web sites. • Accomplish the accuracy more than existing approach

  6. Related work(1/3) • Anti-Phishing has four categories • Why People Fall for Phishing Attacks? • Have examined the reasons that people fall for phishing attacks • Educating people about Phishing Attacks • Focused on online training materials, testing and situated learning • Anti-Phishing User Interface • Focused on the development of better user interface for anti-phishing tools • Automated Detection of Phishing

  7. Relative work(2/3) • Anti-Phishing user interface • Toolbar-based approach • Browser extensions • Dynamic Security Skins • Web Wallet

  8. Relative Work(3/3) • Automated detection of phishing • To use heuristics to judge whether a page has phishing characteristics. • Host name, domain name, URLs,… • To use a blacklist that lists reported phishing URLs

  9. CANTINA | Basic Concept • Criminals often create phishing sites by copying and then modifying a legitimate site’s web pages • Contain brand names and terms of legitimate pages • Robust Hyperlinks • To find a broken links • Add lexical signature to URLs • If link doesn’t work, then feed signature to search engine • Ex. http://aaa.com/a.html?lexical-signature==“word1+word2+...+word5” • TF/IDF (Term frequency/Inverse document frequency) • Frequency based algorithm. • Basic algorithm for search engine • comparing and classifying documents • A term has a high TF-IDF weight by having a high term frequency in a given document

  10. CANTINA | Basic Concept Calculate TF-IDF weight of each term Web page Take the five terms with highest TF-IDF weight Search top file term(term1+term2..) using google Compare the domain name with google search results Phishing site : domain name of current page do not match the domain name of the N top search results (30)

  11. CANTINA | Basic Concept Faked Page TF/IDF Top 5 : eBay, user, sign, help, forgot

  12. CANTINA | Basic Concept Real Page TF/IDF Top 5 : eBay, user, sign, help, forgot

  13. CANTINA | Basic Concept

  14. CANTINA | Additional Solutions • Basic CANTINA has a number of false positive • Solutions • Add the current domain name to the lexical signature • ZMP(Zero results Means Phishing) • Google returns zero search results • Meaningless domain(e.g., “u-s-j.be”) • Larger set of heuristics based on related work • From existing approach (e.g., SpoofGuard, PILFER) • Age of Domain, Known Images, Suspicious URL,…

  15. Evaluation | Effectiveness #1(1/2) • Four conditions • Basic TF-IDF • Basic TF-IDF + domain name • Basic TF-IDF + ZMP • Basic TF-IDF + domain + ZMP • 100 phishing URLs and 100 legitimate URLs • Phishing URLs : PhishTank.com • Legitimate URLs : From previous study

  16. Evaluation | Effectiveness #1(2/2) • Basic TF-IDF + ZMP + domain • False positives a little high • Final TF-IDF

  17. Evaluation | Effectiveness #2(1/2) • Want to reduce false positives • Combining several heuristics method

  18. Evaluation | Effectiveness #2(2/2) • Determining the best weights for these heuristics is a typical classification problem. • Use a simple forward linear model • Used 100 phishing URLs, 100 legitimate to find weights

  19. Evaluation | Effectiveness #3(1/2) • To evaluate the effectiveness of Final-TF-IDF, Final-TD-IDF+heuristics, SpoofGuard, and Netcraft • SpoofGuard : the highest true positive rate • Relies entirely on heuristics • Netcraft : one of the best toolbars overall • Uses a combination of heuristics and an extensive blacklist. • 100 phishing URLs from PhishTank.com • 100 legitimate URLs • 35 sites often attacked (citibank. Papayl) • 35 top pages from Alexa ( most popular sites) • 30 random web pages from random.yahoo.com

  20. Evaluation | Effectiveness #3(2/2) • Reduced false positives from 6% to 1% by combining Final-TF-IDF with simple heuristics • But, true positive was decreased

  21. Discussion • Limitations • Does not apply to non-English web sites • System Performance • Depend on performance of Google search engine • Attacks by criminals • use image instead of words • Add invisible text • Circumventing TF-IDF and PageRank • Using “Google Bombs” • Attempt a DoS attack on Google

  22. Conclusion • CANTINA uses TF-IDF + search engines + heuristics to find phishing web sites • 97% true positives with 6% false positives • 89% true positives with 1% false positives • Shifts problem of identifying phishing sites to a search engine problem

  23. Q&A

More Related