1 / 22

PhishNet: Predictive Blacklisting to detect Phishing Attacks

PhishNet: Predictive Blacklisting to detect Phishing Attacks. Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26. 1. Reference. Pawan Prakash, Manish Kumar, Ramana Rao Kompella and Minaxi Gupta , “ PhishNet: Predictive Blacklisting to Detect Phishing Attacks ,” in IEEE INFOCOM 2010.

Download Presentation

PhishNet: Predictive Blacklisting to detect Phishing Attacks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PhishNet: Predictive Blacklisting to detect Phishing Attacks Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/4/26 1

  2. Reference • Pawan Prakash, Manish Kumar, Ramana Rao Kompella and Minaxi Gupta, “PhishNet: Predictive Blacklisting to Detect Phishing Attacks,” in IEEE INFOCOM 2010.

  3. Outline • Introduction • Two Major Components of PhishNet • URL prediction component • Approximate URL matching component • Evaluation • Conclusion

  4. Introduction • Phishing attacks • Set up fake web sites mimicking real businesses in order to lure innocent users into revealing sensitive information • Blacklisting • Match a given URL with a list of URLs belonging to a blacklist • Problem of blacklisting • Malicious URLs cannot be known before a certain amount of prevalence in the wild

  5. Two Major Components of PhishNet • URL prediction component • Generate new URLs (child) from known phishing URLs (parent) by employing various heuristics • Test whether the new URLs generated are indeed malicious • Approximate URL matching component • Perform an approximate match of a new URL with the existing blacklist

  6. Component 1:Heuristics for Generating New URLs • Typical blacklist URLs structure • http://domain.TLD/directory/filename?query string • H1: Replacing TLDs • H2: IP address equivalence • H3: Directory structure similarity • H4: Query string substitution • H5: Brand name equivalence

  7. Heuristics for Generating New URLs • H1: Replacing TLDs • 3, 210 effective top-level domains (TLDs) • Replace the effective TLD of the parent URL with 3, 209 other effective TLDs • H2: IP address equivalence • Phishing URLs having same IP addresses are grouped together into clusters • Create new URLs by considering all combinations of hostnames and pathnames

  8. Heuristics for Generating New URLs (cont’d) • H3: Directory structure similarity • URLs with similar directory structure are grouped together • Build new URLs by exchanging the filenames among URLs belonging to the same group • Parent • www.abc.com/online/signin/paypal.htm www.xyz.com/online/signin/ebay.htm • Child • www.abc.com/online/signin/ebay.htm www.xyz.com/online/signin/paypal.htm

  9. Heuristics for Generating New URLs (cont’d) • H4: Query string substitution • Build new URLs by exchanging the query strings among URLs • Parent • www.abc.com/online/signin/ebay?XYZ • www.xyz.com/online/signin/paypal?ABC • Child • www.abc.com/online/signin/ebay?ABC • www.xyz.com/online/signin/paypal?XYZ

  10. Heuristics for Generating New URLs (cont’d) • H5: Brand name equivalence • Build new URLs by substituting brand names occurring in phishing URLs with other brand names

  11. Component 1: Verification • Conduct a DNS lookup to filter out sites that cannot be resolved • For each of the resolved URLs • Try to establish a connection to the corresponding server • For each successful connection • Initiate a HTTP GET request to obtain content from the server • If the HTTP header from the server has status code 200/202 (successful request) • Perform a content similarity between the parent and the child URLs • If the URL’s content has sharp resemblance (above say 90%) with the parent URL • Conclude that the child URL is a bad site

  12. Component 2: Approximate Matching • Determine whether a given URL is a phishing site or not

  13. M1: Matching IP Address • Perform a direct match of the IP address of URL with the IP addresses of the blacklist entries • Assign a normalizedscore based on the number of blacklist entries that map to a given IP address • If IP address IPi is common to ni URLs min{ni} (max{ni}): the minimum (maximum) of the number of phishing URLs hosted by blacklisted entries of IP addresses

  14. M2: Matching Hostname • Perform hostname match with those in the blacklist • Domains of phishing URLs • Specifically registered for hosting phishing sites • Hosted on free/paidfor web-hosting services (WHS) • Identify whether an incoming URL consists of a WHS or not • Matching WHSes • Matching non-WHSes

  15. M2: Matching Hostname (cont’d)

  16. M3: Matching Directory Structure • Perform directory structure match with those in the blacklist • Philosophy of this design • H3 (directory structure similarity) • H4 (query string substitution) • ni: the number of URLs corresponding to a directory structure

  17. M4: Matching Brand Names • Check for existence of brand names in pathname and query string of URLs • ni: the number of occurrences of the brand name • Compute a final cumulative score • Assign different weights to different modules

  18. Evaluation: Component 1 • Collect 6,000 URLs from PhishTank (2009/7/2 ~ 2009/7/25)

  19. Evaluation: Component 2 • How many benign (malicious) sites are (not) flagged as malicious • Data source • Phishing URLs • PhishTank (consists of about 18, 000 URLs) • SpamScatter (14, 000 URLs) • Benign URLs • DMOZ (100, 000 benign URLs ) • 20, 000 benign URLs from Yahoo Random URL generator (YRUG)

  20. Evaluation: Component 2 (cont’d) • Training phase • Create various data structures using the phishing URLs • Testing phase • An input URL is flagged as a phishing or a benign site • Weight of individual modules • W(M1, M2, M3, M4) = (1.0, 1.0, 1.5, 1.5)

  21. Evaluation: Component 2 (cont’d)

  22. Conclusion • Address major problems associated with blacklists • Two major components of PhishNet • URL prediction component • Approximate URL matching component • Flag new URLs effectively

More Related