1 / 22

Versatile Publishing For Privacy Preservation

Versatile Publishing For Privacy Preservation. Xin Jin, Mingyang Zhang, Nan Zhang George Washington University. Gautam Das University of Texas at Arlington. Outline. Introduction Inference For Multiple Privacy Rules Guardian Normal Form GD and UAD Algorithms Experimental Results

noah
Download Presentation

Versatile Publishing For Privacy Preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Versatile Publishing For Privacy Preservation Xin Jin, Mingyang Zhang, Nan Zhang George Washington University Gautam Das University of Texas at Arlington

  2. Outline • Introduction • Inference For Multiple Privacy Rules • Guardian Normal Form • GD and UAD Algorithms • Experimental Results • Conclusion

  3. Privacy Preserving Data Publishing • QI SA, i.e., an adversary knowing QI cannot infer the SA of a tuple (beyond a privacy guarantee). • A privacy guarantee example: l–diversity 2 – diversity Published Table

  4. A Sneak Peek at Real Application • The Texas Department of State Health Services publishes every year a table of all patients discharged from more than 450 state-licensed hospitals. • www. Dshs.state.tx.us/thcic/Hospitals/HospitalsData. shtm • Defines 9 privacy requirements. • Example: • If a hospital has fewer than five discharges of a particular gender, then suppress the zipcode of its patients of that gender. • Race is changed to ‘Other’ and ethnicity is suppressed if a hospital has fewer than ten discharges of a race. • The entire zipcode and gender code are suppressed if the ICD code indicates alcohol or drug use or an HIV diagnosis. • …

  5. Texas Inpatient Discharge Data Example: If a hospital has fewer than five discharges of a particular gender, then suppress the zipcode of its patients of that gender. hospital, gender zipcode

  6. Multiple SA Publishing • [MKGV06] defines multiple SA attributes • Treats Si as the sole SA attribute and {Q1, Q2, …, Qm, S1, …, Si-1, Si+1, …, Sn} is treated as QI. • Lack of flexibility: provides stronger privacy definition than necessary. age, ICD, state, gender race age, ICD, hospital, race state SA: race and state

  7. A Novel Problem: Versatile Publishing • Allows the privacy requirement of publishing a table to be defined as an arbitrary set of privacy rules. • Each rule: {Q1, Q2, …, Qp} {S1, S2, …, Sr} • LHS attributes RHS attributes • Assures that an adversary learning the LHS attributes cannot learn the RHS attributes beyond a pre-defined privacy guarantee such as l-diversity, t-closeness, etc..

  8. A Running Example Rule #1: age, ICD race Rule #2: gender, ICD state Rule #3: hospital, race state Privacy guarantee: 2-diversity

  9. Simple Solution #1:Straight Decomposition age, ICD race hospital, race state gender, ICD state hospital, race state join Asian is linked with TX or MN Asian is linked with TX or CA Intersection Attack [GKS08] asian TX, violating

  10. Multiple SA Publishing Method • Defines as SA all attributes that appear on the RHS of at least one privacy rule, and QI as the set of all other attributes. Rule #1: age, ICD race Rule #2: gender, ICD state Rule #3: hospital, race state 2 SA: race, state 4 SA: ICD, state, race,hospital Curse of dimensionality Rule #4: hospital, age ICD Rule #5: gender, race hospital

  11. Traditional Data Normalization • Step 1: Obtain irreducible functional dependencies (FD). • Step 2: Test whether there is any FD violates the normal form over the large table. • Step 3: Decompose the table to remove the violation if there is any.

  12. Inference For Multiple Rules • Inference on multiple privacy rules. • Example: AB C implies that A C and B C • Completeness of Inference Rules

  13. Guardian Normal Form (GNF) • Non-triviality: a privacy rule satisfied by two anonymized table might be broken by the combination of these two, due to intersection attack. • Guardian Normal Form (GNF): a normal form for the schema of published tables which guarantees that all privacy rules are guaranteed over the collection of published tables. • GNF is defined at the schema-level of published tables rather than tuple-level.

  14. An Example hospital state race age ICD gender Rule #1: age, ICD race

  15. An Example hospital state race is unreachable from age or ICD race age ICD gender Rule #1: age, ICD race

  16. An Example hospital state state is reachable from either gender or ICD race age ICD gender Rule #2: gender, ICD state

  17. Guardian Decomposition Algorithm • Similar in spirit to the database normalization algorithm [EN03] (decomposition into BCNF) • Find a privacy rule which violates GNF, decompose the existing sub-tables to address the privacy rule, and continue until no more offending privacy rule exists. Greedily add attributes if GNF remains End: no further decomposition, publish T11 and T12

  18. Utility Aware Decomposition Algorithm • Leverage the link between utility optimization and as the MIN-VERTEX-COLORING problem.

  19. Experimental Results

  20. Conclusion • Defined novel problem of versatile publishing which captures the real-world requirement of multiple privacy rules. • Derived the sound and complete set of inference axioms for privacy rules. • Defined guardian normal form (GNF). • Developed two decomposition algorithms GD and UAD and conducted comprehensive experiments.

  21. Reference [1] Texas Department of State Health Services, User manual of texas hospital inpatient discharge public use data file, 2008 [2] A. Machanavajjhala, D. Kifer, J. Gehrke and M. Vekitasubramaniam. l-diversity: Privacy beyond k-anonymization, in ICDE, 2006. [3] S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxililary information in data privacy, in KDD 2008 [3] R. Elmasri and S.B. Navathe. Fundamentals of Database Systems. (4th Edition), Addison Wesley, 2003.

  22. Thank You

More Related