1 / 45

Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Typo-Squatting: a Nuisance or a Threat to Your Traffic?. Mishari Almishari. Outline. Introduction Background Methodology Parked Domain Classifier Data Sets Results Future Work Related Work Conclusion. Introduction - Motivation. Traffic is important to domains!

bin
Download Presentation

Typo-Squatting: a Nuisance or a Threat to Your Traffic?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari

  2. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion

  3. Introduction - Motivation • Traffic is important to domains! • no point of launching without incoming traffic • Loosing/Gaining traffic => loosing/gaining money • One way to price the ADS is PPC => how important traffic • Traffic Diversion could be a serious threat to a domain

  4. Introduction - Motivation • Typos may divert the traffic • Users vulnerable to making typos • Users may forget about visiting target domain • Threat to Target Domain! • Intentionally registering such typo domains is called Typo-squatting

  5. Introduction - Goal • To study how much traffic typo-squatters can get from target domains • Are those domains attracting much traffic? • Search engines typo-corrections! • Browser auto-completions! • How much traffic target domains is loosing? • Is it of negligible ratio or a serious threat? • Do users go back to target domains or get distracted?

  6. Introduction - Challenges • How to identify typo-squatting domains? • Does Typo mean Typo-squatting? • Short Domains • www.abc.com and www.abd.com • Longer Domains • www.walmart.com and www.walkmart.com • If not, how can we? • Hijacking indicator

  7. Introduction - Contribution • Automatic and accurate identification of typo-squatting domains • show how much traffic target domains are loosing towards typo-squatting domains

  8. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data • Results • Related Work • Future Work • Conclusion

  9. Background – Domain Parking Domain Parking showing a temporary page for an unused domain before launching them

  10. Background - Domain Parking

  11. Background – Domain Parking

  12. Background – Domain Parking

  13. Background – Domain Parking • Domain Parking Service • Parks and hosts unused domains • Monetize the traffic by showing ads • Many Typo-squatting domains are parked domains (Wang et al, 06), (Keats, 07)

  14. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data • Results • Future Work • Related Work • Conclusion

  15. Methodology • Data Collection • Identifying Typo-Squatting Domains

  16. Methodology - Data Collection • DNS traces @ UCI Revolvers • Internal requests to domain names • DNS query proceeds http request • Caching limitation • Our study represents a lower-bound

  17. Methodology – Identify Typo-squatting Domain • Identify Similar Domains • Single Error Typo • Single error accounts for 90-95% of spelling errors • www.walmart.com and www.walkmart.com • gTLD substitution • www.amazon.com and www.amazon.org

  18. Methodology – Identify Typo-squatting Domains • But Similar domain is not enough! • www.walmart.com and www.walkmart.com • Random Sample • More than 54% are not Typo-squatting

  19. Methodology – Identify Typo-squatting Domain • Identify Hijacking Indicator • Inappropriate Content • Domain For Sale • Forwarding to other domains • Ads – listing (Parked Domain) • More than 80%

  20. Methodology – Identify Typo-squatting Domain Similar Domain Parked Domain AND Typo-Squatting Domain

  21. Methodology – Identify Typo-squatting Domain • How to identify Parked Domain? • Parked Domain Classifier • Presence of Parking signatures • Well-known parking signatures (domain names/urls)

  22. Methodology - Summary Identify Similar Domains Identify Parked Domains List of Typo-squatting Domains

  23. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data • Results • Future Work • Related Work • Conclusion

  24. Parked Domain Classifier Build Data Set Extract Core Features Combine Into Classifier

  25. Data Set • Data Set consists of 2,800 domains • 700 are parked domain • Collected from MS Strider Website • 2,100 are non-parked domains • Collected From the fourteen Yahoo Directory Top Categories

  26. Feature Selection • Heuristically, Identify common features in parked domain • Compute the distribution of those features for verification • Common Link Ratio Max

  27. Combining Features Into Classifier • Tried Different Classifier Algorithms • Decision Tree • SVM • K-Nearest Neighbor • Random Forest • The best performance

  28. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion

  29. DATA Sets • DNS Traces • Four Months • Anonymous • CNAME and A • ~ 30 million domains (~ 2 billion hits) (~ 30,000 users) • Target Domain Set • Alexa’s Top 500 popular domains

  30. Typo-Squatting Domains & Hits • 1,332 typo-squatting • 13,431 hits • Is it Large or Small? • 500 Target Domains • 4 Month Period • ~ 30,000 users • Given Similar Ratio may translate to large number • 30,000 => 13,000 • 300,000 => 130,000 • 3000,000 => 1,300,000

  31. Typo-squatting Ratio • 0.025% of total number of queries • 89% LE 1% (70% LE 0.1%) ( 57% LE 0.01%)

  32. User Correction Ratio – Alexa-500 on average, 54% of typo-squatting queries are corrected

  33. Potential Hit Loss • 0.012% • 92% LE 1% (78% LE 0.1%) (64% LE 0.01%)

  34. Potential Money Loss • 0.008% • 96% LE % (91% LE 0.1%) ( 81% LE 0.01%)

  35. Non-existing Similar Domains • 463potential typo-squatting • 8,285 potential hits • 0.015% of total number of queries • 96% LE 1% (83% LE 0.1%) (66% LE 0.01%)

  36. Typo-squatting Domains – TP60 • 629 typo-squatting • 15,499 hits • 0.045%of total number of queries • 76% LE 1% (60% LE 0.5%)

  37. Top Ten Typo-squatting Domains • 19 % of all Typo-squatting hits

  38. Top Ten Target Domains • Responsible of 55% to all typo-squatting queries of Alexa-500 • 50 Million hits of “www.facebook.com”

  39. Typo Characterization • Most Typos are single errors (95% VS 5%) • Most gTLD sub are “com” to “org” (50%) • Add - 63% are of adjacent keys • Sub – 23% are of adjacent keys • Sub – 13% of substitutions are “a” and “o” • Spelling error

  40. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion

  41. Future Work • How much target domains are paying squatters? • Enhance our identification technique • Typo Modeling for getting traffic back • Why People go to Parked Domains? • How can you increase the traffic

  42. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion

  43. Related Work • MS Strider Project [Wang et al. Sruti06] • McAfee Study [Keats McAfee White Paper 07] • JAAL project [Banerjee et al. Infocom 08]

  44. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion

  45. Conclusion • Accurately and automatically identify typo-squatting domains • How much traffic go typo-squatters • Bound on how much traffic the target domain is loosing towards typo-squatting • inconsequential

More Related