1 / 16

Privacy Preserving Data Mining on Moving Object Trajectories

Privacy Preserving Data Mining on Moving Object Trajectories. Gy ő z ő Gid ó falvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen Aalborg University. Outline. Need for location privacy Quantifying location privacy

traci
Download Presentation

Privacy Preserving Data Mining on Moving Object Trajectories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy Preserving Data Mining on Moving Object Trajectories Győző Gidófalvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen Aalborg University

  2. Outline • Need for location privacy • Quantifying location privacy • Existing methods / systems for location privacy • Undesired loss of privacy • Privacy-preserving, grid-based framework for collecting and mining location data • System architecture • Anonymization / partitioning policies • Mining of anonymized trajectories • Experiments and results • Conclusions MDM, Mannheim 2007

  3. Need for Location Privacy • “New technologies can pinpoint your location at any • time and place. They promise safety and convenience • but threaten privacy and security” • Cover story, IEEE Spectrum, July 2003 MDM, Mannheim 2007

  4. Location-Based Services • With all its privacy threats, why do users still use location-detection devices? • Wide spread of LBSs • Location-based store finders • Location-based traffic reports • Location-based advertisements • LBSs rely on the implicit assumption that users agree on revealing their private user locations • LBSs trade their services with privacy • LBSs make heavy use of context, including patterns in the location data, which is extracted using data mining methods. MDM, Mannheim 2007

  5. Quantifying Location Privacy • User perception of privacy: • “One should only be able to infer a region R that I am in” • Things to consider: external spatio-temporal knowledge • Location and movement of objects is limited in space and time • Road networks and other geographical constraints • Location of businesses • Opening hours • Locations of other moving objects in a given region • Common measures for location privacy: • The area of R >= A_min • k-anonymity: there should be at least (k-1) other moving objects in R MDM, Mannheim 2007

  6. Existing Methods / Systems for Location Privacy • Considering k-anonymity only: • Adaptive-Interval Cloaking Algorithm (MobiSys’03) • Divide the entire system area into quadrants of equal size, until the quadrant includes the user and k-1 other users • Clique-Cloak Algorithm (ICDCS’05) • A clique graph is constructed to search for a minimum bounding rectangle that includes the user and k-1 other users • Considering both k-anonymity and minimum area • CASPER: Adaptive Location Anonymizer + Privacy-Aware Query Processor (VLDB’06) • Grid-based pyramid structure to put the exact location of mobile users into cloaking rectangles made of grid cells MDM, Mannheim 2007

  7. Problems with Existing Systems / Methods • Requires trusted middleware • Providing k-anonymity in environments with untrusted components is unknown and likely to be computationally prohibitive. • Notion of location privacy guaranteed by k-anonymity may not be satisfactory (large number of users in a small area where they do not want to be observed). • Non-deterministically or probabilistically reporting different sized and/or positioned cloacking rectangles for the same location sacrifices location privacy. • Traditional mining methods cannot be easily / efficiently extended to the anonymized location data. MDM, Mannheim 2007

  8. Undesired Loss of Privacy: Example • By studying the “anonymized” locations of a single user, • one can conclude and quantify the likelihood that the • user is in the intersection of the cloaked rectangles • returned on subsequent visits. MDM, Mannheim 2007

  9. Privacy-preserving, grid-based framework for collecting and mining location data • Goals: • Avoid privacy loss • Avoid using trusted middleware component • Allow users to specify desired level of privacy • Obtain detailed and accurate patterns from anonymized data • Approach: • Deterministic mapping from exact locations to anonymization rectangles based on a single predefined 2D grid and user privacy settings • Smart clients are responsible for constructing anonymization rectangles by partitioning the space based on the 2D grid • Anonymization rectangles are constructed so that one can only infer with probability <= maxLocProb which grid cell the user is in • Mine probabilistic patterns from probabilistic location data MDM, Mannheim 2007

  10. System Architecture MDM, Mannheim 2007

  11. Home Office Partitioning Policies • CRP: Common Regular Partitioning • All users use a common regular partitioning • Guarantees the same, in-space uniform privacy level for all users • IRP: Individual Regular Partitioning • Every user uses his/her own regular partitioning • Guarantees in-space uniform, individual privacy levels to users • IIP: Individual Irregular Partitioning • Every user specifies a few private locations with corresponding desired levels of privacy, and constructs a partitioning that meets these requirements • Guarantees in-space non-uniform, individual privacy levels to users MDM, Mannheim 2007

  12. Mining of Anonymized Trajectories • Probabilistic data: • A location of an object is reported in terms of an anonymization rectangles (i.e., a set of (n) grid cells) • The object is inside grid cell c_i with location probability c_i.prob (=1/n) • Probabilistic Dense Spatio-Temporal Area Query: • During a time period find all grid cells where • the maximum number of objects is >= min_count and • the average location probability of the cell is >= min_prob • Evaluation of query results: • Set of true patterns D • False negative (N), is the error of not finding a pattern that does exist in the data  FNR = |N| / |D| • False positive (P), is the error of finding a “pattern” that does not exist in the data  FPR = |P| / |D| MDM, Mannheim 2007

  13. Experiment Setup • DATA: 600-3000 trajectories from Brinkhoff’s network-based generator of moving objects • 4 types of partitioning based on the 3 policies: • 2x2 CRP and 4x4 CRP • IRP (at most 4x4 grid cells / partition) • IIR (at most 4x4 grid cells at the start and end of trajectories otherwise 1x1) • Evaluation of mining accuracy (FPR & FNR) under varying: • Parameter settings for min_count and min_prob • Grid size • Time span • Number of trajectories MDM, Mannheim 2007

  14. Results • Few false negatives • The number of false positives increases with min_count and grid size, and decreases with time span and the number of trajectories • For each policy, min_prob can be tuned to find an optimal situation (FP vs. FN) MDM, Mannheim 2007

  15. Conclusions • Non-deterministic generalization can lead to privacy loss • Presented a grid-based framework for anonymized data collection and mining that: • Does not require a trusted middleware • Maps exact locations to anonymization rectangles in a deterministic way according to one of three anonymization policies: CRP, IRP, IIP • Avoids privacy loss • Allows users to specify individual desired privacy levels • Allows to extend traditional data mining methods and discover accurate patters while meeting the privacy requirements • Is computationally effective and conceptually simple MDM, Mannheim 2007

  16. Thank you for your attention! MDM, Mannheim 2007

More Related