Privacy Preserving Serial Data Publishing By Role Composition - PowerPoint PPT Presentation

privacy preserving serial data publishing by role composition n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Privacy Preserving Serial Data Publishing By Role Composition PowerPoint Presentation
Download Presentation
Privacy Preserving Serial Data Publishing By Role Composition

play fullscreen
1 / 87
Privacy Preserving Serial Data Publishing By Role Composition
127 Views
Download Presentation
selma
Download Presentation

Privacy Preserving Serial Data Publishing By Role Composition

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu1, Ada Wai-Chee Fu1, Raymond Chi-Wing Wong2, Lei Chen2, Jiuyong Li3 The Chinese University of Hong Kong1The Hong Kong University of Science and Technology2 University of South Australia3 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

  2. Outline • Sequential Releases • Existing Privacy Models • m-invariance • Privacy breaches • Our Proposed Privacy Model • l-scarcity • Experiments • Conclusion

  3. Release the data set to public Hospital 1. Sequential Releases Time = 1 Public This table satisfies some privacy requirements(e.g., m-invariance) Published Data Medical Data

  4. Release the data set to public Hospital Hospital 1. Sequential Releases This table satisfies some privacy requirements(e.g., m-invariance) Time = 1 Time = 2 Public Public Published Data Published Data Medical Data Medical Data Insertions, deletions and updates

  5. Hospital Hospital Hospital 1. Sequential Releases This table satisfies some privacy requirements(e.g., m-invariance) Time = 1 Time = 2 Time = 3 Public Public Public Published Data Published Data Published Data Medical Data Medical Data Medical Data Insertions, deletions and updates

  6. Hospital Hospital Hospital Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t 1. Sequential Releases Time = 1 Time = 2 Time = 3 Public Public Public Published Data Published Data Published Data Medical Data Medical Data Medical Data

  7. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t 2. Existing Privacy Models Considers insertions only Does not consider deletions and updates • Byun et al., “Secure Anonymization for Incremental datasets”, Secure Data Management, 2006 Considers insertions only Does not consider deletions and updates • Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008 Considers insertions and deletions only Does not consider updates • Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007 Updates cannot simply be regarded as “a deletion and then an insertion” when privacy is considered. Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

  8. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t 2. Existing Privacy Models • Sensitive Diseases • Transient diseases • Permanent diseases e.g., If an individual is linked to flu in a published table, s/he can be linked to flu or not in the later published table. • curable • E.g. flu, fever • incurable • E.g., HIV e.g., If an individual is linked to HIV in a published table, s/he MUST be linked to HIV in the later published table (that they exist in). We are the first to study these two kinds of sensitive values.

  9. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t 2. Existing Privacy Models Does not consider transient/permanent values Considers insertions only Does not consider deletions and updates • Byun et al., “Secure Anonymization for Incremental datasets”, Secure Data Management, 2006 • Contributions: • We consider a more realistic setting of sequential releases. • Insertions, deletions and updates • Transient/permanent values Considers insertions only Does not consider deletions and updates • Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008 We cannot simply adapt these existing privacy models to this realistic setting. Considers insertions and deletions only Does not consider updates • Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007 Also considers transient/permanent values Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

  10. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t 2. Existing Privacy Models Problem (m-invariance):At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m. • Byun et al., “Secure Anonymization for Incremental datasets”, Secure Data Management, 2006 Problem (l-scarcity):At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/l. • Fung et al, “Anonymity for Continuous Data Publishing”, EDBT, 2008 • Xiao et al, “m-invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007 Our proposed privacy model (l-scarcity): - Considers insertions, deletions and updates together

  11. Release the data set to public Hospital Public Medical Data + Some Useful Attributes Voter Registration List Medical Data + Some Useful Attributes Medical Data

  12. Release the data set to public Public Medical Data + Some Useful Attributes Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  13. Release the data set to public Public Generalization Medical Data + Some Useful Attributes Voter Registration List 3-diversity Each individual is linked to “HIV” with probability at most 1/3 in THIS PUBLISHED TABLE 3-diversity only focuses on ONE-TIME publishing 3-invariance focuses on MULTIPLE-TIME publishingIt also makes use of the idea of 3-diversity Idea: Each individual is linked to “HIV” with probability at most 1/3 with respect to MULTIPLE PUBLISHED TABLES Hospital Medical Data + Some Useful Attributes Medical Data

  14. Release the data set to public Time = 1 3-invariance Public Medical Data + Some Useful Attributes Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  15. Release the data set to public Time = 1 3-invariance Public Time = 1 Voter Registration List p1 p2 p3 {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} p4 p5 p6 {Flu, HIV, Fever} {Flu, HIV, Fever} Hospital Medical Data + Some Useful Attributes Medical Data

  16. Release the data set to public Time = 1 3-invariance Public Time = 1 Voter Registration List {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} Hospital Medical Data + Some Useful Attributes Medical Data

  17. Release the data set to public Time = 1 3-invariance Public Time = 1 Voter Registration List {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} Hospital Medical Data + Some Useful Attributes Medical Data

  18. Release the data set to public Time = 1 3-invariance Public Voter Registration List {Flu, HIV, Fever} Time = 1 {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} Hospital Medical Data + Some Useful Attributes Medical Data

  19. Release the data set to public Time = 1 3-invariance Time = 1 Public Voter Registration List {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} {Flu, HIV, Fever} Hospital Medical Data + Some Useful Attributes Medical Data

  20. Release the data set to public Time = 1 3-invariance Time = 1 Public Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  21. Release the data set to public Time = 1 3-invariance Time = 1 Public Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  22. Release the data set to public Hospital Time = 2 3-invariance Time = 1 Public Voter Registration List Medical Data + Some Useful Attributes Medical Data

  23. Release the data set to public Time = 2 3-invariance Time = 1 Public Medical Data + Some Useful Attributes Voter Registration List p2 p3 p6 p1 p4 p5 Hospital This table satisfies 3-invariance. This is because each individual is linked to the SAME signature. Medical Data + Some Useful Attributes Medical Data Idea of 3-invariance: Each individual is linked to the SAME signature in each published table.

  24. Release the data set to public Time = 2 3-invariance Time = 1 Public Time = 2 Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  25. Release the data set to public Time = 2 3-invariance Time = 1 Public Voter Registration List Time = 2 Hospital Medical Data + Some Useful Attributes Medical Data

  26. Release the data set to public Time = 2 3-invariance Time = 1 Time = 2 Public Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  27. Release the data set to public Time = 2 3-invariance Time = 1 Time = 2 Public Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  28. Release the data set to public Time = 2 3-invariance Time = 1 Time = 2 Public Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  29. Release the data set to public Hospital Time = 3 3-invariance Time = 1 Time = 2 Public Voter Registration List Medical Data + Some Useful Attributes Medical Data

  30. Release the data set to public Time = 3 3-invariance Time = 1 Time = 2 Public Voter Registration List Medical Data + Some Useful Attributes p2 p3 p5 p1 p4 p6 Hospital Medical Data + Some Useful Attributes Medical Data This table satisfies 3-invariance. This is because each individual is linked to the SAME signature.

  31. Release the data set to public Time = 3 3-invariance Time = 1 Time = 2 Public Time = 3 Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  32. Release the data set to public Time = 3 3-invariance Time = 1 Time = 2 Public Voter Registration List Time = 3 Hospital Medical Data + Some Useful Attributes Medical Data

  33. Release the data set to public Time = 3 3-invariance Time = 1 Time = 2 Time = 3 Public Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  34. Release the data set to public Time = 3 3-invariance Time = 1 Time = 2 Time = 3 Public Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  35. Release the data set to public Time = 3 3-invariance Time = 1 Time = 2 Time = 3 Public Voter Registration List Hospital Medical Data + Some Useful Attributes Medical Data

  36. p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5 3-invariance Time = 1 Time = 2 Time = 3 Time = 3 Time = 1 Time = 2

  37. Knowledge 2 I know all voter registration lists 3-invariance Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  38. Knowledge 4 Knowledge 3 Knowledge 2 I know that HIV is a permanent sensitive value. I know all voter registration lists There are TWO HIVs in the published table. 3-invariance I can deduce that p1 and p6 cannot be linked to HIV. Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  39. Knowledge 4 Knowledge 3 Knowledge 2 I know that HIV is a permanent sensitive value. I know all voter registration lists There are TWO HIVs in the published table. Yes No No I can deduce that p1 and p6 cannot be linked to HIV. Proof by contradiction. Suppose p1 is linked to HIV. Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  40. Knowledge 4 Knowledge 3 Knowledge 2 I know that HIV is a permanent sensitive value. I know all voter registration lists There are TWO HIVs in the published table. Yes No No No I can deduce that p1 and p6 cannot be linked to HIV. No Proof by contradiction. Suppose p1 is linked to HIV. Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  41. Knowledge 4 Knowledge 2 Knowledge 3 I know all voter registration lists I know that HIV is a permanent sensitive value. There are TWO HIVs in the published table. Yes No No No I can deduce that p1 and p6 cannot be linked to HIV. No Proof by contradiction. No Suppose p1 is linked to HIV. Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  42. Knowledge 4 Knowledge 3 Knowledge 2 There are TWO HIVs in the published table. I know all voter registration lists I know that HIV is a permanent sensitive value. Yes No No No I can deduce that p1 and p6 cannot be linked to HIV. No p1 CANNOT be linked to HIV. Proof by contradiction. No Suppose p1 is linked to HIV. Contradiction! Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  43. Knowledge 4 Knowledge 3 Knowledge 2 I know that HIV is a permanent sensitive value. I know all voter registration lists There are TWO HIVs in the published table. No I can deduce that p1 and p6 cannot be linked to HIV. No Proof by contradiction. Yes Suppose p6 is linked to HIV. Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  44. Knowledge 4 Knowledge 3 Knowledge 2 There are TWO HIVs in the published table. I know that HIV is a permanent sensitive value. I know all voter registration lists No No No I can deduce that p1 and p6 cannot be linked to HIV. No Proof by contradiction. Yes Suppose p6 is linked to HIV. Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  45. Knowledge 2 Knowledge 4 Knowledge 3 I know all voter registration lists I know that HIV is a permanent sensitive value. There are TWO HIVs in the published table. No No No No I can deduce that p1 and p6 cannot be linked to HIV. No Proof by contradiction. Yes Suppose p6 is linked to HIV. Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  46. Knowledge 2 Knowledge 3 Knowledge 4 I know all voter registration lists I know that HIV is a permanent sensitive value. There are TWO HIVs in the published table. No No No No I can deduce that p1 and p6 cannot be linked to HIV. No p6 CANNOT be linked to HIV. Proof by contradiction. Yes Suppose p6 is linked to HIV. Contradiction! Knowledge 1 Time = 3 Time = 1 Time = 2 p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  47. Knowledge 4 Knowledge 3 Knowledge 2 There are TWO HIVs in the published table. I know that HIV is a permanent sensitive value. I know all voter registration lists Problem (m-invariance):At the current time t, we want to generate a table which satisfies the following. Probability that an individual is linked to a sensitive value wrt all published tables at any time <= t is at most 1/m. I can deduce that p1 and p6 cannot be linked to HIV. I can deduce that p4 MUST be linked to HIV. Privacy breaches! Knowledge 1 Time = 3 Time = 1 Time = 2 Why? 3-invariance p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  48. Original Medical Data Time = 1 HIV-decoys (i.e., p1 and p3) are used to reduce the strong linkage between p2 and HIV. p2 is an HIV-holder. p1 is an HIV-decoy. I can deduce that p4 MUST be linked to HIV. p3 is an HIV-decoy. Privacy breaches! Knowledge 1 Time = 3 Time = 1 Time = 2 Why? p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  49. Cohort 3 Cohort 2 Cohort 1 Original Medical Data Time = 1 p2 p1 p3 p2 is an HIV-holder. HIV-decoy p1 is an HIV-decoy. HIV-holder HIV-decoy I can deduce that p4 MUST be linked to HIV. p3 is an HIV-decoy. Privacy breaches! Knowledge 1 Time = 3 Time = 1 Time = 2 Why? p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5

  50. Cohort 3 Cohort 1 Cohort 2 Original Medical Data p4 p6 p5 Time = 1 p2 p1 p3 HIV-decoy HIV-holder HIV-decoy p4 is an HIV-holder. I can deduce that p4 MUST be linked to HIV. Privacy breaches! Knowledge 1 p5 is an HIV-decoy. Time = 3 Time = 1 Time = 2 Why? p6 is an HIV-decoy. p1 p2 p3 p2 p3 p6 p2 p3 p5 p1 p4 p6 p4 p5 p6 p1 p4 p5