1 / 25

Defending against large-scale crawls in online social networks

Defending against large-scale crawls in online social networks . Mainack Mondal † Bimal Viswanath † Allen Clement † Peter Druschel † Krishna Gummadi † Alan Mislove ‡ Ansley Post †* † MPI-SWS ‡ Northeastern University *Now at Google

jaeger
Download Presentation

Defending against large-scale crawls in online social networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Defending against large-scale crawls in online social networks Mainack Mondal† Bimal Viswanath† Allen Clement† Peter Druschel† Krishna Gummadi† Alan Mislove‡ Ansley Post†* †MPI-SWS ‡Northeastern University *Now at Google CoNEXT, December 2012

  2. Lots of personal data on Online Social Networks (OSNs) CoNEXT, December 2012

  3. What is the concern with aggregation of this large data? • Aggregators can mine this large data • To infer attributes missing in the data, e.g. sexual orientation • Aggregators can republish this data in easily accessible form • Neither user nor OSN has control over usage of crawled data • Problem for OSN operators • User data is valuable asset to OSN operators • OSN operators are blamed for misuse of user data [NYTimes ’10] OSNs need to limit large-scale aggregation of user data In 2010, 171 M Facebook user’s data published in BitTorrent CoNEXT, December 2012

  4. Challenge • We are defending against a crawler who • Wants to crawl as many accounts as possible • Wants to crawl as fast as possible • Our goal is • Limit the rate of crawling • Make the crawlers as slow as possible CoNEXT, December 2012

  5. Existing solution: Simple rate-limiting • OSNs rate-limit on per-account or per IP address basis • Crawlers can defeat rate-limit using multiple accounts Or, the crawlers can use compromised accounts The crawlers can create multiple fake accounts or Sybils CoNEXT, December 2012

  6. Our solution: Genie • Assumption: Social links to good users are harder to get than accounts • Replace user-account-based rate-limiting with link-based rate-limiting CoNEXT, December 2012

  7. Outline • Background and key idea • Genie design • Credit networks • How to use credit networks to defend against crawlers • Using difference between user and crawler activity • Genie evaluation CoNEXT, December 2012

  8. Credit Networks [EC ‘11] • Nodes trust each other by providing pair-wise credit • Credit is used to pay for the services received A B 1 2 4 5 CoNEXT, December 2012

  9. Credit Networks [EC ‘11] • Nodes trust each other by providing pair-wise credit • Credit is used to pay the services received ACB To obtain a service, find path(s) with sufficient credits 2 5 6 3 2 3 3 4 CoNEXT, December 2012

  10. How can we map OSN to credit networks ? • OSN operator forms credit network from the social network • Operator replenishes credit on each link at a fixed rate • Credit deducted from links to view another user’s profile 2 5 3 3 6 4 3 2 2 4 3 3 A C D B CoNEXT, December 2012

  11. How do credit network defend against crawlers? Amount of crawling is proportional to attack cut Rest of the Network (normal users) Sybil accounts Compromised accounts Attack cut is small Attack cut may be larger (SybilRank, NSDI 2012) CoNEXT, December 2012 11

  12. Difference between normal users and crawlers • Reciprocity in profile views • Normal users are more reciprocal than crawlers • Repeated profile views • Normal users repeatedly visit the same set of profiles • Locality of views CoNEXT, December 2012

  13. Difference in locality between normal users and crawlers • Renren graph and user browsing trace [IMC ‘10] • 33 K users, 96 K activities (2 weeks) • Most of the normal views are local % of views crawler activity Flickr: Mislove et al. [WOSN ‘08] Orkut: Cha et al. [IMC ‘09] CoNEXT, December 2012

  14. Genie design principles • Use a credit network to rate limit links • Exploit difference between normal and crawler activity to discriminate crawlers • Charge more for views further away CoNEXT, December 2012

  15. Genie design • New charging model: Pay more to view profiles far away Credit charged per link = Shortest path distance between two nodes -1 Rate of crawling decreases with increased path length 1 4 2 - 2 3 6 4 - 2 - 2 + 2 2 + 2 3 2 + 2 4 4 5 A C D B CoNEXT, December 2012

  16. Outline • Background and key idea • Genie design • Credit networks • How to use credit networks to defend against crawlers • Using difference between user and crawler activity • Genie evaluation CoNEXT, December 2012

  17. Genie evaluation • Does Genie limit attackers while allowing normal users? • The parameter to tweak: Credit replenishment rate per link • Replenishment rate too high: Crawlers will be allowed • Replenishment rate too low: Users will be heavily penalized CoNEXT, December 2012

  18. Experimental setup • Genie simulator written in C++ • Input: social graph and user activity trace • Output: allowed/flagged for each activity • Normal user activity trace from Renren • Generated multiple synthetic traces for other graphs • We model a strong and efficient crawler • Crawler controls compromised user accounts • Each good user profile is crawled once • Crawlers try to crawl as many profiles as possible CoNEXT, December 2012

  19. Does Genie limit crawlers? % of users crawled per week Only 2.7% of the network is crawled in 1 week Credits/week per link The crawlers are slowed down ~3000 times CoNEXT, December 2012

  20. Does Genie penalize good users? % of user activity flagged 2.6% of total activities from 0.8 %users flagged Credit/week per link CoNEXT, December 2012

  21. Does Genie penalize good users? 10 8 6 4 2 0 % of user activity flagged Trade-off point % of users crawled per week Credit/week per link CoNEXT, December 2012

  22. Who are these flagged users? • 3 Users with very high number of random profile views • Shows crawler like behavior • 70% of the flagged activity are by these users • Users with normal # of profile views but very few friends • 99% of flagged users have less than 5 friends • Adding 4 more friends unflags97% of these users CoNEXT, December 2012

  23. Efficiency of Genie • In our Genie simulator • To scale up Genie we used Canal library [EuroSys ’12] • Multithreaded implementation • Used a 24-core, 48 GB physical memory machine for evaluation • For a million node social graph • Memory overhead 5 GB • Each view request processed in 0.65 ms on average CoNEXT, December 2012

  24. Summary • We propose rate-limiting links to defend against crawlers • We strengthen our defense using difference between normal user and crawler activities • We evaluated Genie on real world user activity trace CoNEXT, December 2012

  25. Thank you CoNEXT, December 2012

More Related