1 / 20

Harvesting SSL Certificate Data to Identify Web-Fraud

Harvesting SSL Certificate Data to Identify Web-Fraud. Reporter : 鄭志欣 Advisor : Hsing-Kuo Pao 2010/10/04. 1. Conference. 2.

said
Download Presentation

Harvesting SSL Certificate Data to Identify Web-Fraud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harvesting SSL Certificate Data to Identify Web-Fraud Reporter : 鄭志欣 Advisor : Hsing-Kuo Pao 2010/10/04 1

  2. Conference 2 Mishari Al Mishari, Emiliano De Cristofaro, Karim El Defrawy, and Gene Tsudik. "Harvesting SSL Certificate Data to Identify Web-Fraud." ,Submitted to ICDCS’10,http://arxiv.org/abs/0909.3688

  3. Outline 3 Introduction X.509 certificates Measurements and Analysis of SSL Certificates Certificate-Based Classifier Conclusion

  4. Introduction 4 • Web-fraud is one of the most unpleasant features of today’s Internet. • Phishing , Typosquatting • Can we use the information in the SSL certificatesto identify web-fraud activities such as phishing and typosquatting , without compromising user privacy? • This paper presents a novel technique to detect web-fraud domains that utilize HTTPS.

  5. Typosqatting 5

  6. Contributions 6 The classifier achieves a detection accuracy over 80% and, in some cases, as high as 95%. Our classifier is orthogonal to prior mitigation techniques and can be integrated with other methods. Note that the classifier only relies on data in the SSL certificate and not any other private user information.

  7. X.509 certificates 7

  8. Measurements and Analysis of SSL Certificates 8 • A. HTTPS Usage and Certificate Harvest • Legitimate • Phishing and Typosquatting • B. Certificate Analysis • Analysis of Certificate Boolean Features • Analysis of Certificate Non-Boolean Features

  9. A. HTTPS Usage and Certificate Harvest 9

  10. A. HTTPS Usage and Certificate Harvest 10 • Legitimate and Popular Domain Data Sets. • Alexa: 100, 000 most popular domains according to Alexa. • .com: 100, 000 random samples of .com domain zone file, collected from VeriSign. • .net: 100, 000 random samples of .net domain zone file, collected from VeriSign. • We find that 34% of Alexa domains use HTTPS; 21% in .com and 16% in .net. (Commercial)

  11. A. HTTPS Usage and Certificate Harvest 11 • Phishing Data Set • We collected 2, 811 domains considered to be hosting phishing scams from the PhishTank web site. • 30% of these phishing web sites employ HTTPS. • Typosquatting Data Set • we first identified the typo domains in our .com and .net data sets by using Google’s typo correction service. • We discovered that 9, 830 out of 38, 617 are parked domains.

  12. B. Certificate Analysis 12

  13. B. Certificate Analysis 13 Analysis of Certificate Boolean Features

  14. B.Certificate Analysis F14 : Serial Number Length Fig : CDF of Serial Number Length of Alexa, .com .net (c) phishing (d) typosquatting 14

  15. Certificate Analysis F15 : Jaccard Distance 15

  16. Summary of certificate Feature Analysis 16 Around 20% of legitimate popular domains are still using the signature algorithm “md5WithRSAEncryption“ despite its clear insecurity. A significant percentage (> 30%) of legitimate domain certificates are expired and/or self-signed. Duplicate certificate percentages are very high in phishing domains. For most features, the difference in distributions between Alexa and malicious sets is larger than that between .com/.net and malicious sets.

  17. Certificate-Based Classifier 17 A. Phishing Classifier B. Typosquatting Classifier

  18. Phishing Classifier Table IV Performance of classifiers - Data set consists of (A)420 phishing certificates and (B)420 non-phishing certificates (Alexa, .COM and .NET) Table V Performance of classifiers - Data set consists of (A)420 phishing certificates and (B)420 non-phishing certificates (Alexa) 18

  19. Typosquatting Classifier Table VI Preformance of classifiers - Data set consists of (A)486 typosquatting certificates and (B)486 non-typosquatting certificates (Top Alexa, .COM and .NET) Table VII Preformance of classifiers - Data set consists of (A)486 typosquatting certificates and (B)486 popular domain certificates 19

  20. Conclusion 20 We design and build a machine-learning-based classifier that identifies fraudulent domains using HTTPS based solely on their SSL certificates, thus also preserving user privacy. We believe that our results may serve as a motivating factor to increase the use of HTTPS on the Web. Use of HTTPS can help identifying web-fraud domains.

More Related