1 / 59

Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies

Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies. Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke). Overview of the talk.

eljah
Download Presentation

Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke) Summer @ Census, 8/15/2013

  2. Overview of the talk • An inherent trade-off between privacy (confidentiality) of individuals and utility of statistical analyses over data collected from individuals. • Differential privacy has revolutionized how we reason about privacy • Nice tuning knob εfor trading off privacy and utility Summer @ Census, 8/15/2013

  3. Overview of the talk • However, differential privacy only captures a small part of the privacy-utility trade-off space • No Free Lunch Theorem • Differentially private mechanisms may not ensure sufficient utility • Differentially private mechanisms may not ensure sufficient privacy Summer @ Census, 8/15/2013

  4. Overview of the talk • I will present a new privacy framework that allows data publishers to more effectively tradeoff privacy for utility • Better control on what to keep secret and who the adversaries are • Can ensure more utility than differential privacy in many cases • Can ensure privacy where differential privacy fails Summer @ Census, 8/15/2013

  5. Outline • Background • Differential privacy • No Free Lunch [Kifer-M SIGMOD ’11] • No `one privacy notion to rule them all’ • Pufferfish Privacy Framework [Kifer-M PODS’12] • Navigating the space of privacy definitions • Blowfish: Practical privacy using policies [ongoing work] Summer @ Census, 8/15/2013

  6. Data Privacy Problem Utility: Privacy: No breach about any individual Server DB Individual 1 Individual 2 Individual 3 Individual N r1 r2 r3 rN Summer @ Census, 8/15/2013

  7. Data Privacy in the real world Summer @ Census, 8/15/2013

  8. Many definitions & several attacks • Linkage attack • Background knowledge attack • Minimality /Reconstruction attack • de Finetti attack • Composition attack Differential Privacy T-closeness K-Anonymity E-Privacy L-diversity Li et. al ICDE ‘07 Machanavajjhala et. al VLDB ‘09 Sweeney et al.IJUFKS ‘02 Machanavajjhala et. al TKDD ‘07 Dwork et. al ICALP ‘06 Summer @ Census, 8/15/2013

  9. Differential Privacy For every pair of inputs that differ in one value For every output … D1 D2 O Adversary should not be able to distinguish between any D1 and D2 based on any O Pr[A(D1) = O] Pr[A(D2) = O] . log < ε (ε>0) Summer @ Census, 8/15/2013

  10. Algorithms • No deterministic algorithm guarantees differential privacy. • Random sampling does not guarantee differential privacy. • Randomized response satisfies differential privacy. Summer @ Census, 8/15/2013

  11. Laplace Mechanism DatabaseD Query q True answer q(D) q(D) + η Researcher Privacy depends on the λ parameter η h(η) α exp(-η / λ) Mean: 0, Variance: 2 λ2 Summer @ Census, 8/15/2013

  12. Laplace Mechanism [Dwork et al., TCC 2006] Thm: If sensitivity of the query is S, then the following guarantees ε-differential privacy. λ = S/ε Sensitivity: Smallest number s.t. for any D,D’ differing in one entry, || q(D) – q(D’) ||1≤ S(q) Summer @ Census, 8/15/2013

  13. Contingency tables Each tuple takes k=4 different values D Count( , ) Summer @ Census, 8/15/2013

  14. Laplace Mechanism for Contingency Tables Mean : 8 Variance : 8/ε2 D Sensitivity = 2 Summer @ Census, 8/15/2013

  15. Composition Property If algorithms A1, A2, …, Ak use independent randomness and each Ai satisfies εi-differential privacy, resp.Then, outputting all the answers together satisfies differential privacy with ε = ε1 + ε2 + … + εk Privacy Budget Summer @ Census, 8/15/2013

  16. Differential Privacy • Privacy definition that is independent of the attacker’s prior knowledge. • Tolerates many attacks that other definitions are susceptible to. • Avoids composition attacks • Claimed to be tolerant against adversaries with arbitrary background knowledge. • Allows simple, efficient and useful privacy mechanisms • Used in LEHD’s OnTheMap [M et al ICDE ‘08] Summer @ Census, 8/15/2013

  17. Outline • Background • Differential privacy • No Free Lunch [Kifer-M SIGMOD ’11] • No `one privacy notion to rule them all’ • Pufferfish Privacy Framework [Kifer-M PODS’12] • Navigating the space of privacy definitions • Blowfish: Practical privacy using policies[ongoing work] Summer @ Census, 8/15/2013

  18. Differential Privacy & Utility • Differentially private mechanisms may not ensure sufficient utility for many applications. • Sparse Data: Integrated Mean Square Error due to Laplace mechanism canbe worse than returning a random contingency table for typical values of ε (around 1) • Social Networks [M et al PVLDB 2011] Summer @ Census, 8/15/2013

  19. Differential Privacy & Privacy • Differentially private algorithms may not limit the ability of an adversary to learn sensitive information about individuals when records in the data are correlated • Correlations across individuals occur in many ways: • Social Networks • Data with pre-released constraints • Functional Dependencies Summer @ Census, 8/15/2013

  20. Laplace Mechanism and Correlations Auxiliary marginals published for following reasons: Legal: 2002 Supreme Court case Utah v. Evans Contractual: Advertisers must know exact demographics at coarse granularities D Does Laplace mechanism still guarantee privacy? Summer @ Census, 8/15/2013

  21. Laplace Mechanism and Correlations • 2 + Lap(2/ε) • 2 + Lap(2/ε) • 2 + Lap(2/ε) Count ( , ) = 8 + Lap(2/ε) Count ( , ) = 8 – Lap(2/ε) Count ( , ) = 8 – Lap(2/ε) Count ( , ) = 8 + Lap(2/ε) D Summer @ Census, 8/15/2013

  22. Laplace Mechanism and Correlations • 2 + Lap(2/ε) • 2 + Lap(2/ε) • 2 + Lap(2/ε) Mean : 8 Variance : 8/ke2 D can reconstruct the table with high precision for large k Summer @ Census, 8/15/2013

  23. No Free Lunch Theorem It is not possible to guarantee any utility in addition to privacy, without making assumptions about • the data generating distribution • the background knowledge available to an adversary [Kifer-M SIGMOD ‘11] [Dwork-Naor JPC ‘10] Summer @ Census, 8/15/2013

  24. To sum up … • Differential privacy only captures a small part of the privacy-utility trade-off space • No Free Lunch Theorem • Differentially private mechanisms may not ensure sufficient privacy • Differentially private mechanisms may not ensure sufficient utility Summer @ Census, 8/15/2013

  25. Outline • Background • Differential privacy • No Free Lunch [Kifer-M SIGMOD ’11] • No `one privacy notion to rule them all’ • Pufferfish Privacy Framework [Kifer-M PODS’12] • Navigating the space of privacy definitions • Blowfish: Practical privacy using policies[ongoing work] Summer @ Census, 8/15/2013

  26. Pufferfish Framework Summer @ Census, 8/15/2013

  27. Pufferfish Semantics • What is being kept secret? • Who are the adversaries? • How is information disclosure bounded? • (similar to epsilon in differential privacy) Summer @ Census, 8/15/2013

  28. Sensitive Information • Secrets: S be a set of potentially sensitive statements • “individual j’s record is in the data, and j has Cancer” • “individual j’s record is not in the data” • Discriminative Pairs: Mutually exclusive pairs of secrets. • (“Bob is in the table”, “Bob is not in the table”) • (“Bob has cancer”, “Bob has diabetes”) Summer @ Census, 8/15/2013

  29. Adversaries • We assume a Bayesian adversary who is can be completely characterized by his/her prior information about the data • We do not assume computational limits • Data Evolution Scenarios: set of all probability distributions that could have generated the data ( … think adversary’s prior). • No assumptions: All probability distributions over data instances are possible. • I.I.D.: Set of all f such that: P(data = {r1, r2, …, rk}) = f(r1) x f(r2) x…x f(rk) Summer @ Census, 8/15/2013

  30. Information Disclosure • Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if Summer @ Census, 8/15/2013

  31. Pufferfish Semantic Guarantee Posterior odds of s vs s’ Prior odds of s vs s’ Summer @ Census, 8/15/2013

  32. Pufferfish & Differential Privacy • Spairs: • six: record i takes the value x • Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Summer @ Census, 8/15/2013

  33. Pufferfish & Differential Privacy • Data evolution: • For all θ = [ f1, f2, f3, …, fk ] • Adversary’s prior may be any distribution that makes records independent Summer @ Census, 8/15/2013

  34. Pufferfish & Differential Privacy • Spairs: • six: record i takes the value x • Data evolution: • For all θ = [ f1, f2, f3, …, fk ] A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} Summer @ Census, 8/15/2013

  35. Summary of Pufferfish • A semantic approach to defining privacy • Enumerates the information that is secret and the set of adversaries. • Bounds the odds ratio of pairs of mutually exclusive secrets • Helps understand assumptions under which privacy is guaranteed • Differential privacy is one specific choice of secret pairs and adversaries • How should a data publisher use this framework? • Algorithms? Summer @ Census, 8/15/2013

  36. Outline • Background • Differential privacy • No Free Lunch [Kifer-M SIGMOD ’11] • No `one privacy notion to rule them all’ • Pufferfish Privacy Framework [Kifer-M PODS’12] • Navigating the space of privacy definitions • Blowfish: Practical privacy using policies [ongoing work] Summer @ Census, 8/15/2013

  37. Blowfish Privacy • A special class of Pufferfish instantiations Both pufferfish and blowfish are marine fish of the Tetraodontidae family Summer @ Census, 8/15/2013

  38. Blowfish Privacy • A special class of Pufferfish instantiations • Extends differential privacy using policies • Specification of sensitive information • Allows more utility • Specification of publicly known constraints in the data • Ensures privacy in correlated data • Satisfies the composition property Summer @ Census, 8/15/2013

  39. Blowfish Privacy • A special class of Pufferfish instantiations • Extends differential privacy using policies • Specification of sensitive information • Allows more utility • Specification of publicly known constraints in the data • Ensures privacy in correlated data • Satisfies the composition property Summer @ Census, 8/15/2013

  40. Sensitive Information • Secrets: S be a set of potentially sensitive statements • “individual j’s record is in the data, and j has Cancer” • “individual j’s record is not in the data” • Discriminative Pairs: Mutually exclusive pairs of secrets. • (“Bob is in the table”, “Bob is not in the table”) • (“Bob has cancer”, “Bob has diabetes”) Summer @ Census, 8/15/2013

  41. Sensitive information in Differential Privacy • Spairs: • six: record i takes the value x • Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Summer @ Census, 8/15/2013

  42. Other notions of Sensitive Information • Medical Data • OK to infer whether individual is healthy or not. • E.g., (Bob is Healthy, Bob is Diabetes) is not a discriminative pair of secrets for any individual • Partitioned Sensitive Information: Summer @ Census, 8/15/2013

  43. Other notions of Sensitive Information • Geospatial Data • Do not want the attacker to distinguish between “close-by” points in the space. • May distinguish between “far-away” points • Distance based Sensitive Information Summer @ Census, 8/15/2013

  44. Generalization as a graph • Consider a graph G = (V, E), where V is the set of values that an individual’s record can take. • E encodes the set of discriminative pairs • Same for all records. Summer @ Census, 8/15/2013

  45. Blowfish Privacy + “Policy of Secrets” • A mechanism M satisfy blowfish privacy wrt policy G if • For every set of outputs of the mechanism S • For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E Summer @ Census, 8/15/2013

  46. Blowfish Privacy + “Policy of Secrets” • A mechanism M satisfy blowfish privacy wrt policy G if • For every set of outputs of the mechanism S • For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E • For any x and y in the domain, Shortest distance between x and y in G Summer @ Census, 8/15/2013

  47. Blowfish Privacy + “Policy of Secrets” • A mechanism M satisfy blowfish privacy wrt policy G if • For every set of outputs of the mechanism S • For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E • Adversary is allowed to distinguish between x and y that appear in different disconnected components in G Summer @ Census, 8/15/2013

  48. Algorithms for Blowfish • Consider an ordered 1-D attribute • Dom = {x1,x2,x3,…,xd} • E.g., ranges of Age, Salary, etc. • Suppose our policy is: Adversary should not distinguish whether an individual’s value is xj or xj+1 . x1 x2 x3 xd Summer @ Census, 8/15/2013

  49. Algorithms for Blowfish • Suppose we want to release histogram privately • Number of individuals in each age range • Any differentially private algorithm also satisfies blowfish • Can use Laplace mechanism (with sensitivity 2) C(x1) C(x3) C(xd) x1 x2 x3 xd Summer @ Census, 8/15/2013

  50. Ordered Mechanism • We can answer a different set of queries to get a different private estimator for the histogram. Sd … S3 S2 S1 C(x1) C(x3) C(xd) x1 x2 x3 xd Summer @ Census, 8/15/2013

More Related