1 / 45

Privately Querying Location-based Services with SybilQuery

Privately Querying Location-based Services with SybilQuery. Pravin Shankar , Vinod Ganapathy, and Liviu Iftode Department of Computer Science Rutgers University { spravin, vinodg, iftode } @ cs.rutgers.edu. Location-based Services (LBSes). How is the traffic in the road ahead?.

noel
Download Presentation

Privately Querying Location-based Services with SybilQuery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privately Querying Location-based Serviceswith SybilQuery Pravin Shankar, Vinod Ganapathy, and Liviu Iftode Department of Computer Science Rutgers University { spravin, vinodg, iftode } @ cs.rutgers.edu

  2. Location-based Services (LBSes) How is the traffic in the road ahead? Where is my nearest restaurant? Implicit assumption: • Users agree to reveal their locations for access to services IBM Frontiers of Cloud Computing 2010

  3. Privacy concerns while querying an LBS • With two weeks of GPS data from a user’s car, we can infer home address (median error < 60 m) [Krumm ‘07] • 5% of people are uniquely identified by their home and work locations even if it is known only at the census tract level[Golle and Partridge ‘09] IBM Frontiers of Cloud Computing 2010

  4. Querying an LBS Client Home loc1 LBS loc2 . . . locn Work IBM Frontiers of Cloud Computing 2010

  5. Basic Idea Client Home' Home'' Home loc1, loc1', loc1'' LBS loc2, loc2', loc2'' . . . locn, locn', locn'' Work'' Work' Work IBM Frontiers of Cloud Computing 2010

  6. What the LBS sees Which of these is the real user? IBM Frontiers of Cloud Computing 2010

  7. Outline • Introduction • SybilQuery Overview • Design Challenges • Implementation • Evaluation and Results • Conclusions and Future Work IBM Frontiers of Cloud Computing 2010

  8. SybilQuery Overview • Basic Idea: Achieves privacy using synthetic (Sybil) queries • For each real user trip, the system generates • k-1 Sybil start and end points (termed endpoints) • k-1 Sybil paths • For each real query made, the system generates • k-1 Sybil Queries IBM Frontiers of Cloud Computing 2010

  9. SybilQuery Design IBM Frontiers of Cloud Computing 2010

  10. Outline • Introduction • SybilQuery Overview • Design Challenges • Implementation • Evaluation and Results • Conclusions and Future Work IBM Frontiers of Cloud Computing 2010

  11. SybilQuery Challenges • Endpoint generation: • How to automatically generate synthetic endpoints similar to a pair of real endpoints? • Path generation: • How to choose the waypoints of the Sybil path? • Query generation: • How to simulate motion along the Sybil path? IBM Frontiers of Cloud Computing 2010

  12. Endpoint Generator • Produces synthetic endpoints that resemble the real source and destination • High-level idea: • Tag locations with features • Identify clusters of locations that share similar features • Feature used in SybilQuery: traffic statistics IBM Frontiers of Cloud Computing 2010

  13. Tagging locations with traffic statistics • Naïve approach: Annotate locations with descriptive tags • Eg. “parking lot”, “downtown office building”, “freeway” • Laborious manual task • Our approach: Automatically compute features using a database of regional traffic statistics • Dataset: Month-long GPS traces from the San Francisco Cabspotter project - 530 unique cabs; 529,533 trips • Compute traffic density τl for each location from dataset IBM Frontiers of Cloud Computing 2010

  14. Path Generator • Consults an off-the-shelf navigation service • Our implementation uses Microsoft Multimap API to obtain waypoints • Users may not always follow the shortest path to destination • Detours, road closures, user intention • Computes multiple paths to the destination (with varying lengths) • Uses a probability distribution to choose path IBM Frontiers of Cloud Computing 2010

  15. Query Generator • Triggered each time the user queries the LBS • Simulates the motion of users along the Sybil paths • Uses current traffic conditions to more accurately simulate user movement • Eg. Simulate slower movement if traffic is congested IBM Frontiers of Cloud Computing 2010

  16. Endpoint caching • Attack 1: If a real path P frequented by the user (e.g., commuter paths) is associated with multiple Sybil paths: • P can be statistically identifed as the real path • Attack 2: After arriving at the first destination, when a user travels to a new location shortly : • Since the real paths share an endpoint, they could be identified • Solution: Endpoint caching • For most common trips, Sybil endpoints are cached • If the user makes multiple trips from one common endpoint (e.g., home/office), the corresponding Sybil endpoints are cached • When the user embarks on a multi-destination trip, the endpoint of a trip is the same as the startpoint of the following trip IBM Frontiers of Cloud Computing 2010

  17. Providing path continuity • Attack: If a real trip ends before some Sybil trips end • The system stops sending queries • The LBS can differentiate the real path from Sybil paths • SybilQuery guards against this by being an “always on” tool • continues to simulate movement along Sybil paths even when the user’s real trip is complete IBM Frontiers of Cloud Computing 2010

  18. Outline • Introduction • SybilQuery Overview • Design Challenges • Implementation • Evaluation and Results • Conclusions and Future Work IBM Frontiers of Cloud Computing 2010

  19. SybilQuery Implementation • An interface akin to navigation systems • Input: • The source and destination address for the trip • A security parameter k • Number of Sybil users • Query interface: • Integrated with Yahoo! Local Search IBM Frontiers of Cloud Computing 2010

  20. Outline • Introduction • SybilQuery Overview • Design Challenges • Implementation • Evaluation and Results • Conclusions and Future Work IBM Frontiers of Cloud Computing 2010

  21. Evaluation Goals • Privacy • How indistinguishable are Sybil queries from real queries? • Performance • Can Sybil queries be efficiently generated? IBM Frontiers of Cloud Computing 2010

  22. Evaluation: Privacy • User Study • Give the working system to adversarial users, who would try to break the system by find real user paths hidden between Sybil paths • 15 volunteers • Methodology • Pick real paths from the Cabspotter traces • Use SybilQuery to generate Sybil paths with different values of k IBM Frontiers of Cloud Computing 2010

  23. Results from user study IBM Frontiers of Cloud Computing 2010

  24. User approaches to distinguish queries • Contrasting rationale to guess real users • “Circuitous paths” • “Prominent start/end location” • “Odd man out” IBM Frontiers of Cloud Computing 2010

  25. Evaluation: Performance • Setup: • Server: • 2.33 GHz Core2 Duo, 3 GB RAM, 250 GB SATA (7200 RPM) • Client: • 1.73 GHz Pentium-M laptop, 512 MB RAM, Linux 2.6 • Privacy parameter k = 4 (unless otherwise specified) • Micro-benchmarks • One-time and once-per-trip costs • Query-response latency of SybilQuery • Comparison with Spatial Cloaking for Yahoo! local search IBM Frontiers of Cloud Computing 2010

  26. One-time and once-per-trip costs • One-time cost – preprocessing of traffic database • 2 hours 16 mins (processed 529,533 trips) • Once-per-trip costs – endpoint generation and path generation * Includes network latency to query the Microsoft MultiMap API IBM Frontiers of Cloud Computing 2010

  27. Query-response latency of SybilQuery • Scales linearly with k (number of Sybil users) • Sub-second latency for typical values of k IBM Frontiers of Cloud Computing 2010

  28. Conclusions and Future Work • SybilQuery: Efficient decentralized technique to hide user location from LBSes • Experimental results demonstrate: • Sybil queries can be generated efficiently • Sybil queries resemble real user queries • Future Work • Enhance SybilQuery to achieve stronger privacy guarantees, such as l-diversity, t-closeness and differential privacy IBM Frontiers of Cloud Computing 2010

  29. My research on location in mobile computing • Privacy: Users may not want to reveal their private locations for accessing location-based services.SybilQuery – Ubicomp 2009. • Querying mobile phones for real-time location-based state. SocialTelescope – Internship at IBM, Summer 2010. • Incentives for sharing in social networks – WINE 2009. • Rapid change of client location affects network connectivity and performance. Context-Aware Rate Selection (CARS) – a solution for improving network performance by using client location – ICNP 2008. IBM Frontiers of Cloud Computing 2010

  30. Thank You! Pravin Shankar spravin@cs.rutgers.edu

  31. Related Work • Synthetic Locations for Privacy [Krumm ’09, Kido ‘05] • Spacial Cloaking [Gruteser and Grunwald ’03, and others] • Peer-to-peer Schemes [Chow ’06, Ghinita ‘07] • Private Information Retrieval (PIR) [Ghinita ’08] Detailed list is available in paper IBM Frontiers of Cloud Computing 2010

  32. Spatial Cloaking • Spatial Cloaking – k-anonymity solution that uses anonymizers • Users send their location to anonymizer • Anonymizer computes cloaked region • Region where atleast k users are present client anonymizer server IBM Frontiers of Cloud Computing 2010

  33. Performance Comparison with Spatial Cloaking Response Size as users travel • Cloaked regions grow as users travel • SybilQuery overhead constant IBM Frontiers of Cloud Computing 2010

  34. Prior techniques (1/2) client anonymizer server • Spatial Cloaking • Need for Anonymizer - Trusted Third Party • Single point of failure • Scalability and performance bottleneck IBM Frontiers of Cloud Computing 2010

  35. Prior techniques (2/2) • Peer-to-peer schemes • Rely on participating peers • Private Information Retrieval (PIR) • Computationally inefficient IBM Frontiers of Cloud Computing 2010

  36. Tagging locations with traffic statistics (2/2) • Locations represented as QuadTree • Balances precision with scalability San Francisco Airport. Black blocks have higher densities IBM Frontiers of Cloud Computing 2010

  37. Finding suitable endpoints using reverse geocoding • Real endpoints do not start in non-driveable terrain Reverse Geocoding Random point in geographic location Street address closest to the random point IBM Frontiers of Cloud Computing 2010

  38. Our goals • Performance • Autonomy • Ease of deployment IBM Frontiers of Cloud Computing 2010

  39. Basic design of SybilQuery IBM Frontiers of Cloud Computing 2010

  40. Design enhancements • Endpoint Generator • Endpoint caching • Path Generator • Randomizing path selection • Query Generator • Providing path continuity • Adding GPS sensor noise • Handling active adversaries IBM Frontiers of Cloud Computing 2010

  41. Endpoint caching (1/2) • Attack 1: If a real path P frequented by the user (e.g., commuter paths) is associated with multiple sets of Sybil paths: • P can be statistically identifed as the real path • Attack 2: After arriving at the first destination, when a user travels to a new location shortly : • Since the real paths share an endpoint, they could be distinguished from the Sybil paths IBM Frontiers of Cloud Computing 2010

  42. Endpoint caching (2/2) • Solution: SybilQuery employs three types of caching • For most common trips, Sybil endpoints are cached • If the user makes multiple trips from one common endpoint (e.g., home/office), the corresponding Sybil endpoints are cached • When the user embarks on a multi-destination trip, the start points of the Sybil trips are cached • i.e. the endpoint of a trip is the same as the startpoint of the following trip IBM Frontiers of Cloud Computing 2010

  43. Randomizing path selection • Real users may not always follow the shortest path to destination • Detours, road closures, user intention • Path generator computes multiple paths to the destination (each with varying lengths) • Uses a probability distribution (of the frequency with which users choose paths other than the shortest path) to choose an appropriate path IBM Frontiers of Cloud Computing 2010

  44. Handling active adversaries • An actively adversarial LBS may return doctored query responses to differentiate Sybil paths from a client’s real path • For example, it falsely reports traffic congestion at the query location. • SybilQuery handles active adversaries using N-variant queries to multiple LBSes • Unless all the LBSes collude, the adversarial LBS can be detected IBM Frontiers of Cloud Computing 2010

  45. Implementation • SybilQuery implemented as a Python client • Endpoint generator: • Uses a PostgreSQL database with PostGIS spacial extensions to process regional traffic information • Path generator: • Queries the Microsoft Multimap API for waypoints • Query generator: • Interfaced with Yahoo! Local API to simulate movement under the constraints of current traffic IBM Frontiers of Cloud Computing 2010

More Related