1 / 54

Simulatability “The enemy knows the system”, Claude Shannon

Simulatability “The enemy knows the system”, Claude Shannon. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Announcements. Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Recap – L-Diversity.

dustin
Download Presentation

Simulatability “The enemy knows the system”, Claude Shannon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simulatability“The enemy knows the system”, Claude Shannon CompSci 590.03Instructor: AshwinMachanavajjhala Lecture 6 : 590.03 Fall 12

  2. Announcements • Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Lecture 6 : 590.03 Fall 12

  3. Recap – L-Diversity • The link between identity and attribute value is the sensitive information. “Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?” • Adversary knows ≤ L-2 negation statements.“Umeko does not have Heart Disease.” • Data Publisher may not know exact adversarial knowledge • Privacy is breached when identity can be linked to attribute value with high probabilityPr[ “Bob has Cancer” | published table, adv. knowledge] > t Lecture 6 : 590.03 Fall 12

  4. Recap – 3-Diverse Table L-Diversity Principle: Every group of tuples with the same Q-ID values has ≥ L distinct sensitive values of roughly equal proportions. Lecture 6 : 590.03 Fall 12

  5. Outline • Simulatable Auditing • Minimality Attack in anonymization • Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12

  6. Query Auditing Database Database has numeric values (say salaries of employees). Database either truthfully answers a question or denies answering. MIN, MAX, SUM queries over subsets of the database. Question: When to allow/deny queries? Query Yes Safe to publish? No Researcher Lecture 6 : 590.03 Fall 12

  7. Why should we deny queries? • Q1: Ben’s sensitive value? • DENY • Q2: Max sensitive value of males? • ANSWER: 2 • Q3: Max sensitive value of 1styear PhD students? • ANSWER: 3 • But Q3 + Q2 => Xi = 3 Lecture 6 : 590.03 Fall 12

  8. Value-Based Auditing • Let a1, a2, …, ak be the answers to previous queries Q1, Q2, …, Qk. • Let ak+1 be the answer to Qk+1. ai = f(ci1x1, ci2x2, …, cinxn), i = 1 … k+1 cim = 1 if Qi depends on xm Check if any xj has a unique solution. Lecture 6 : 590.03 Fall 12

  9. Value-based Auditing • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. x1x2 x3 x4 x5 Lecture 6 : 590.03 Fall 12

  10. Value-based Auditing -∞ ≤ x1 … x5≤ 10 • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 Lecture 6 : 590.03 Fall 12

  11. Value-based Auditing • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. -∞ ≤ x1 … x4 ≤ 8 => x5 = 10 max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12

  12. Value-based Auditing Denial means some value can be compromised! • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12

  13. Value-based Auditing What could max(x1, x2, x3, x4) be? • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12

  14. Value-based Auditing From first answer, max(x1,x2,x3,x4) ≤ 10 • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12

  15. Value-based Auditing If, max(x1,x2,x3,x4) = 10Then, no privacy breach • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12

  16. Value-based Auditing Hence, max(x1,x2,x3,x4) < 10=> x5 = 10! • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12

  17. Value-based Auditing Denials leak information. Attack occurred since privacy analysis didnot assume that attacker knows the algorithm. Hence, max(x1,x2,x3,x4) < 10=> x5 = 10! • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Ans: 8 DENY Lecture 6 : 590.03 Fall 12

  18. SimulatableAuditing [Kenthapadi et al PODS ‘05] • An auditor is simulatableif the decision to deny a query Qk is made based on information already available to the attacker. • Can use queries Q1, Q2, …, Qk and answers a1, a2, …, ak-1 • Cannotuse ak or the actual data to make the decision. • Denials provably do not leak informaiton • Because the attacker could equivalently determine whether the query would be denied. • Attacker can mimic or simulate the auditor. Lecture 6 : 590.03 Fall 12

  19. Simulatable Auditing Algorithm Ans > 10 => not possible • Data Values: {x1, x2 , x3 , x4 , x5}, Queries: MAX. • Allow query if value of xi can’t be inferred. Ans = 10 => -∞ ≤ x1 … x4 ≤ 10 SAFE UNSAFE Ans < 10 => x5 = 10 max(x1, x2 , x3 , x4 , x5) x1x2 x3 x4 x5 Ans: 10 10 max(x1, x2 , x3 , x4) Before computing answer DENY Lecture 6 : 590.03 Fall 12

  20. Summary of Simulatable Auditing • Decision to deny answers must be based on past queries answered in some (many!) cases. • Denials can leak information if the adversary does not know all the information that is used to decide whether to deny the query. Lecture 6 : 590.03 Fall 12

  21. Outline • Simulatable Auditing • Minimality Attack in anonymization • Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12

  22. Minimality attack on Generalization algorithms • Algorithms for K-anonymity, L-diversity, T-closeness, etc. try to maximize utility. • Find a minimally generalized table in the lattice that satisfies privacy, and maximizes utility. • But … attacker also knows this algorithm! Lecture 6 : 590.03 Fall 12

  23. Example Minimality attack [Wong et al VLDB07] • Dataset with one quasi-identifier and 2 values q1, q2. • q1, q2 generalize to Q. • Sensitive attribute: Cancer – yes/no • We want to ensure P[Cancer = yes] < ½. • OK to know if an individual does not have Cancer. • Published Table: Lecture 6 : 590.03 Fall 12

  24. Which input datasets could have led to the published table? • Output dataset • {q1,q2}  Q • (“2-diverse”) • Possible Input dataset • 3 occurrences of q1 Lecture 6 : 590.03 Fall 12

  25. Which input datasets could have led to the published table? • Output dataset • {q1,q2}  Q • (“2-diverse”) • Possible Input dataset • 3 occurrences of q1 This is a better generalization! Lecture 6 : 590.03 Fall 12

  26. Which input datasets could have led to the published table? • Output dataset • {q1,q2}  Q • (“2-diverse”) • Possible Input dataset • 1 occurrence of q1 Lecture 6 : 590.03 Fall 12

  27. Which input datasets could have led to the published table? • Output dataset • {q1,q2}  Q • (“2-diverse”) • Possible Input dataset • 3 occurrences of q1 This is a better generalization! Lecture 6 : 590.03 Fall 12

  28. Which input datasets could have led to the published table? • Output dataset • {q1,q2}  Q • (“2-diverse”) There must be exactly two tuples with q1 • Possible Input dataset • 3 occurrences of q1 Lecture 6 : 590.03 Fall 12

  29. Which input datasets could have led to the published table? • Output dataset • {q1,q2}  Q • (“2-diverse”) • Possible Input dataset • 2 occurrences of q1 Already satisfies privacy Lecture 6 : 590.03 Fall 12

  30. Which input datasets could have led to the published table? • Output dataset • {q1,q2}  Q • (“2-diverse”) • Possible Input dataset • 2 occurrences of q1 Learning Cancer=NO is OK, Hence, this is private Lecture 6 : 590.03 Fall 12

  31. Which input datasets could have led to the published table? • Output dataset • {q1,q2}  Q • (“2-diverse”) • Possible Input dataset • 2 occurrences of q1 P[Cancer = yes | q1] = 1 This is the ONLY input that results in the output! Lecture 6 : 590.03 Fall 12

  32. Outline • Simulatable Auditing • Minimality Attack in anonymization • Transparent Anonymization: Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12

  33. Transparent Anonymization • Assume that the adversary knows the algorithm that is being used. I: All possible input tables O: Output table I(O, A): Input tables that result in O due to algorithm A Lecture 6 : 590.03 Fall 12

  34. Transparent Anonymization • According to I(O, A) privacy must be guaranteed. • Probability must be computed assuming I(O,A) is the actual set of all possible input tables. • What is an efficient algorithm for Transparent Anonymization? • For L-diversity? Lecture 6 : 590.03 Fall 12

  35. Ace Algorithm [Xiao et al TODS’10] Step 1: AssignJust based on the sensitive values, construct (in a randomized fashion) an intermediate L-diverse generation. Step 2: SplitOnly based on the quasi-identifier values (and without looking at sensitive values) , deterministically refine the intermediate solution to maximize utility. Lecture 6 : 590.03 Fall 12

  36. Step 1: Assign • Input Table Lecture 6 : 590.03 Fall 12

  37. Step 1: Assign • St is the set of all tuples (grouped by sensitive value) • Iteratively, • Remove αtuples each from the β (≥L) most frequent sensitive values Lecture 6 : 590.03 Fall 12

  38. Step 1: Assign • St is the set of all tuples (grouped by sensitive value) • Iteratively, • Remove αtuples each from the β(≥L) most frequent sensitive values • 1st iteration β=2, α=2 Lecture 6 : 590.03 Fall 12

  39. Step 1: Assign • St is the set of all tuples (grouped by sensitive value) • Iteratively, • Remove αtuples each from the β (≥L) most frequent sensitive values • 2nd iteration β=2, α=1 Lecture 6 : 590.03 Fall 12

  40. Step 1: Assign • St is the set of all tuples (grouped by sensitive value) • Iteratively, • Remove αtuples each from the β (≥L) most frequent sensitive values • 3rd iteration β=2, α=1 Lecture 6 : 590.03 Fall 12

  41. Intermediate Generalization Lecture 6 : 590.03 Fall 12

  42. Step 2: Split • If a bucket contains α>1 tuples of each sensitive value, split it into two buckets, Ba and Bbs.t., • Pick 1 ≤ αa < αtuples from each sensitive value in bucket B, and put them in bucket Ba. The remaining tuples go to Bb. • The division (Ba, Bb) is optimal in terms of utility. Lecture 6 : 590.03 Fall 12

  43. Why does the Ace algorithm satisfy Transparent L-Diversity? • According to I(O, A) privacy must be guaranteed. • Probability must be computed assuming I(O,A) is the actual set of all possible input tables. O: Output table I: All possible input tables I(O, A): Input tables that result in O due to algorithm A Lecture 6 : 590.03 Fall 12

  44. Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): • Consider an intermediate output Int • Suppose there is some input table T such that Assign(T) = Int • Any other table T’ where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output Int. Lecture 6 : 590.03 Fall 12

  45. Ace algorithm analysis Both tables result in the same intermediate output. Lecture 6 : 590.03 Fall 12

  46. Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): • Consider an intermediate output Int • Suppose there is some input table T such that Assign(T) = Int • Any other table T’, where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output. • The set of input tables I(Int,A) contains all possible assignments of diseases to individuals within each group of Int. Lecture 6 : 590.03 Fall 12

  47. Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): • The set of table I(Int,A) contains all possible assignments of diseases to individuals in each group of Int. • P[Ann has dyspepsia | I(Int,A) and Int] = 1/2 Lecture 6 : 590.03 Fall 12

  48. Ace algorithm analysis Lemma 2:The split phase also satisfies transparent L-diversity. Proof (sketch): • I(Int, Assign) contains all tables where an individual is assigned to an arbitrary sensitive value within the same group in Int. • Suppose some input table T ε I(Int, Assign) results in the final output O after Split. Lecture 6 : 590.03 Fall 12

  49. Ace algorithm analysis • Split does not depend on the sensitive values. Ann Gill Ed Ann Gill Bob Bob Ed Ann Gill Ann Ed Bob Ed Gill Bob dyspepsia flu dyspepsia flu dyspepsia flu dyspepsia flu dyspepsia flu dyspepsia flu results in results in Lecture 6 : 590.03 Fall 12

  50. Ace algorithm analysis Table T Table T’ If T ε I(Int, Assign), and it results in O after split, Then, T’ ε I(Int, Assign), and it results in O after split Lecture 6 : 590.03 Fall 12

More Related