1 / 35

k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure

k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure. Wen Ming Liu 1 , Lingyu Wang 1 , and Lei Zhang 2 1 Concordia University 2 George Mason University ICDT 2010. March 23 , 2010. CIISE / CSIS. Agenda. Background. K-Jump Strategy. Data Utility Comparison.

marcy
Download Presentation

k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure Wen Ming Liu1, Lingyu Wang1, and Lei Zhang2 1 Concordia University 2 George Mason University ICDT2010 March 23 , 2010 CIISE / CSIS

  2. Agenda • Background • K-Jump Strategy • Data Utility Comparison • Conclusion

  3. Agenda • Background • Example • Algorithm anaive and asafe • K-JumpStrategy • Data Utility Comparison • Conclusion

  4. Example Data Holder’s View

  5. Data Holder Example – Data Holder’s View Goal: Release table to satisfy 2-diversity generalization generalization Goal: Release table to satisfy 2-diversity Released! 2-diversity? 2-diversity? generalization algorithm: considering generalization function g1 and then g2 in order Released! • Name: identifier. • DoB: quasi-identifier. • Condition: sensitive attribute.

  6. Example (cont.) Adversary’s View

  7. Example (cont.) – Adversary’s View Goal: Guess what is the micro-data • Attacker knows: • generalization • public knowledge • privacy property Adversary What can adversary infer? The three persons in each group may have the three conditions in any given order. permutation set • Name: identifier. • DoB: quasi-identifier. • Condition: sensitive attribute.

  8. Example (cont.) This would be the adversary’s best guesses of the micro-data table, if the released generalization is his/her only knowledge,However … permutation set

  9. Example (cont.) – Adversary Simulating the Algorithm However, adversary also knows the generalization algorithm, andcan simulate the algorithm to further exclude some invalid guesses.

  10. Example (cont.) – Adversary Simulating the Algorithm Simulating the algorithm Violate privacy! Satisfyprivacy! Mental image Is this the valid guess of the micro-data table? Let’s try to check it using the algorithm! disclosure set permutation set

  11. Decision Process of Safe and Unsafe Algorithms Most existing generalization algorithms (without considering this problem): g1(t0) g2(t0) gi(t0) gn(t0) Evaluate the permutation set. (Adversary’s mental image of the micro-data table without the knowledge about the algorithm) Y Y Y Y t0 N N N N ... ...  per1 per2 peri pern g1 g2 gi gn anaive Safe generalization algorithms (Zhang’07ccs, ….) g1(t0) g2(t0) gi(t0) gn(t0) Evaluate the disclosure set, instead. (Adversary’s mental image of the micro-data table after simulating the algorithm) Y Y Y Y t0 N N N N ... ...  ds1 ds2 dsi dsn per1 per2 peri pern g1 g2 gi gn asafe • box: the ith iteration • diamond: • an evaluation of the privacy property • per: permutation set • ds: disclosure set evaluation path

  12. Agenda • Background • K-Jump Strategy • The Algorithm Family ajump( k ) • Properties of ajump( k ) • DataUtilityComparison • Conclusion

  13. The Algorithm Family ajump(k) g1(t0) g2(t0) g2+k(t0) gn(t0) Y Y Y Y N N N ds1 ds2 ds2+k dsn Y Y Y Y t0 N N N N ... ... per2+k  per1 per2 pern  g1 g2 g2+k gn ajump(k) • naive strategy : evaluate privacy property on permutation set only • safe strategy : evaluate privacy property on disclosure set directly • k-jump strategy: penalize by jumping over the next k-1 iterations naive strategy: efficient but unsafe safe strategy : safe but costly

  14. Properties of ajump(k) g1(t0) g2(t0) g2+k(t0) gn(t0) Y Y Y Y N N N ds1 ds2 ds2+k dsn Y Y Y Y t0 N N N N ... ... per2+k  per1 per2 pern  g1 g2 g2+k gn ajump(k) • Computation of the disclosure set • asafe: to compute ds(gi(t0)), must first compute ds(gj(t)) for all t in per(gi(t0)) and j=1,2, … ,i-1 • ajump: to compute ds(gi(t0)) (2<i<2+k), no longer need to compute ds(g2(t)) for all t in per(gi(t0)) • ds(g1(t0)) and ds(g2(t0)) • ds(g1(t0)) = per(g1(t0)) • ds(g2(t0)) is independent of the distance vector. • Size of the family • There are (n-1)! different jump distance vectors.

  15. Agenda • Background • K-JumpStrategy • Data Utility Comparison • Construction for Theorem 1: • 1-jump and i-jump (1<i) incomparable • Construction for Theorem 2: • i-jump and j-jump (1<i<j) incomparable • Construction for Theorem 3: • K1-jump and K2-jump (K1,K2: vector) incomparable • Construction for proposition 2: • Reusing generalization functions • Results onasafe and ajump(1) • Conclusion

  16. Construction for Theorem1:1-jump and i-jump (1<i) incomparable • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 Belongs to one of the four disjoint sets. privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2

  17. Construction for Theorem1(cont.) : 1-jump and i-jump (1<i) • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 • Considering generalizing these tables using g2 2 S2, S3, S4 cannot be disclosed under g2. privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2

  18. Construction for Theorem1(cont.):1-jump and i-jump (1<i) • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 • Considering generalizing these tables using g2 2 a. Subsets in S1 which with both N and O have C7, C8, or C9 cannot be disclosed under g2. privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2

  19. Construction for Theorem1(cont.):1-jump and i-jump (1<i) • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 • Considering generalizing these tables using g2 2 b. For ajump(i),all tables in S1\S1’ will be excluded from ds3i(t0). privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2 Satisfied!

  20. Construction for Theorem1(cont.):1-jump and i-jump (1<i) • To compute ds3k(t0): Excluding any table t for which p(per1(t))=true 1 • Considering generalizing these tables using g2 2 c. For ajump(1),the disclosure set of all tables in S1\S1’ under g2 do not satisfy the privacy property. privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2 Violated! • The ratio of I being associated with C6 is 5/9.

  21. Construction for Theorem2: i-jump and j-jump (1<i<j) incomparable Show the evaluation paths by figures.

  22. Construction for Theorem2(cont.) : i-jump and j-jump (1<i<j) • The case where i-jump has better utility than j-jump is relatively easier to construct. We only show the construction for the other case. • For this construction, generalization gj+2 will be released for j-jump, while gj+i+1 or after will be released for i-jump.

  23. Construction for Theorem3: • K1-jump and K2-jump (K1,K2:vectors) incomparable

  24. Construction for proposition2: Reusing generalization functions Without reusing g2: • The table will lead to disclosing nothing! Belongs to one of the three disjoint sets. Cannot be disclosed under g1(.) or g3(.) . 1 • the jump distance is 1; • the privacy property: • highest ratio of a sensitive value in a group must be no greater than ½. To compute ds2: 2 Violated!

  25. Construction for proposition2(cont.):Reusing generalization functions g2 is reused as g2’: • To calculate ds2’, the tables can be disclosed under g1, g2, and g3 must be excluded from per2’ S1,S2, and S3 cannot be disclosed under g2, as mentioned above. 1 • the jump distance is 1; • the privacy property: • highest ratio of a sensitive value in a group must be no greater than ½. S2 and S3 cannot be disclosed under g3. 2

  26. Construction for proposition2(cont.):Reusing generalization functions g2 is reused as g2’: • To caculate ds2’, the tables can be disclosed under g1, g2, and g3 must be excluded from per2’ S1,S2, and S3 cannot be disclosed under g2, as mentioned above. 1 a. S12 and S13 cannot be disclosed under g3. • the jump distance is 1; • the privacy property: • highest ratio of a sensitive value in a group must be no greater than ½. S2 and S3 cannot be disclosed under g3. 2 S1 can be further divided into three disjoint subsets 3

  27. Construction for proposition2(cont.):Reusing generalization functions • To compute ds3(t0 in S11): g2 is reused as g2’: Excluding any table t for which p(per1(t))=true A • To caculate ds2’, the tables can be disclosed under g1, g2, and g3 must be excluded from per2’ These subsets cannot be disclosed under g2. Belongs to one of the two disjoint sets (nor under g2). B one instance S1,S2, and S3 cannot be disclosed under g2, as mentioned above. 1 b. The tables in subset S11can be disclosed under g3. S2 and S3 cannot be disclosed under g3. 2 S1 can be further divided into three disjoint subsets 3

  28. Construction for proposition2(cont.):Reusing generalization functions g2 is reused as g2’: • The ratio of D and E being associated with C3 are 0.5, which is the highest ratio. • the jump distance is 1; • the privacy property: • highest ratio of a sensitive value in a group must be no greater than ½. Satisfied!

  29. Results on asafe and ajump(1) • When the privacy property is: • either set-monotonic • or based on the highest ratio of sensitive values • Lemma 3: • p(per(t0))=false  p(any of its subsets)=false • Corollary 1: • The algorithm asafe has the same data utility as ajump(1) 2. When the privacy property is other cases: • Lemma 4: • The ds3 under asafeis a subset of that under ajump(1) • Theorem 5: • The data utility of asafe and ajump(1) is generally incomparable.

  30. Agenda • Background • K-JumpStrategy • DataUtilityComparison • Conclusion

  31. Conclusion • We have proposed a novel k-jump strategy for micro-data disclosure. • Transform a given generalization algorithm into a large number of safe algorithms. • Show the data utility is generally incomparable by constructing counter-examples. • Practical impact: make a secret choice.

  32. Further Result and Future Work • Further Results in the extended version of this paper: • Computational complexity: • Making a secret choice among unsafe algorithms does not yield a safe solution. • Future studies: • Study more efficient safe algorithms. • Employ statistical methods to compare different k-jump algorithms.. • Further investigate the opportunity in reusing generalization functions.

  33. Thank you!

  34. Data Holder Example – Data Holder View Goal: Release table to satisfy 2-diversity generalization generalization Goal: Release table to satisfy 2-diversity 2-diversity? 2-diversity? generalization algorithm: considering generalization function g1 and then g2 in order • Name: identifier. • DoB: quasi-identifier. • Condition: sensitive attribute.

  35. Toy Example • Attacker knows: • generalization • external data • privacy property Data Holder generalized Attacker 2-diversity • Name: identifier. • DoB: quasi-identifier. • Condition: sensitive attribute. What can attacker infer? permutation set

More Related