1 / 22

Towards Privacy in Public Databases

Towards Privacy in Public Databases. Shuchi Chawla, Cynthia Dwork, Frank McSherry, Adam Smith, Hoeteck Wee. Database Privacy. A “Census” problem Individuals provide information Census Bureau publishes sanitized records

malcolmr
Download Presentation

Towards Privacy in Public Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Privacy in Public Databases Shuchi Chawla, Cynthia Dwork, Frank McSherry, Adam Smith, Hoeteck Wee

  2. Database Privacy • A “Census” problem • Individuals provide information • Census Bureau publishes sanitized records • Privacy is legally mandated; what utility can we achieve? • Inherent Privacy vs. Utility trade-off • Our goal: • Find a middle path: • preserve macroscopic properties • “disguise” individual records (containing private info) • Establish a framework for meaningful comparison of techniques Shuchi Chawla

  3. What about Secure Function Evaluation? • Secure Function Evaluation [Yao, GMW] • Allows parties to collaboratively compute a function f of their private inputs  = f(a,b,c,…) ( E.g.,  = sum(a,b,c, …) ) • Each player learns only what can be deduced from  and her own input to f • SFE and privacy are complementary problems: one does not imply the other • SFE: Given what must be preserved, protect everything else • Privacy: Given what must be protected, preserve as much as you can Shuchi Chawla

  4. This talk… • A formalism for privacy • What we mean by privacy • A good sanitization procedure • Results • Histograms and Perturbations • Subsequent work; Open problems Shuchi Chawla

  5. What do we mean by Privacy? • [Ruth Gavison] Protection from being brought to the attention of others • inherently valuable • attention invites further privacy loss • Privacy is assured to the extent that one blends in with the crowd • Appealing definition; can be converted into a precise mathematical statement… Shuchi Chawla

  6. The basic model – a geometric approach • Database consists of pts in high dimensional space R d • Samples from some underlying distribution • Points are unlabeled: you are your collection of attributes • (Relative) distance is everything: points that are closer are more similar and vice versa • A “real” database RDB – controlled by a central authority • n points in d-dimensional space • Think of d as the number of sensitive attributes • A “sanitized” database SDB – released to the world • Information about fake individuals, a summary of the real data, or, a combination of both Shuchi Chawla

  7. The adversary or Isolator • On input SDB and auxiliary information, adversary outputs a point q Rd • q “isolates” a real point x, if it is much closer to x than to x’s neighbors i.e., if B(q,cd) contains fewer than T other points from RDB • c, T – privacy parameters; e.g., c = 4, T = 100 isolated d Not isolated cd RDB Shuchi Chawla

  8. Requirement for the sanitizer • No way of obtaining privacy if AUX already reveals too much! • Sanitization compromises privacy if giving the adversary access to the SDB considerably increases its probability of success • Definition of “considerably” can be forgiving, e.g. 1/1000 • Rigorously: • Provides a framework for describing the power of a sanitization method, and hence for comparisons • Aux is going to cause trouble. Ignore it for now. 2-d in our results D II ’ w.h.p. over RDBD aux  S RDB: | Pr[ xS: I(SDB,aux) iso.s x ] – Pr[xS: I ’(aux) iso.s x ] |  D II ’ w.h.p. over RDBD aux  x RDB: | Pr[ I(SDB,aux) isolates x ] – Pr[ I ’(aux) isolates x ] |  Shuchi Chawla

  9. A “bad” sanitizer that passes [Abhinandan Das, Cornell] • Disguise one attribute extremely well; Leave the others in the clear • Without info about the special attribute, the adversary cannot “isolate” any point • However, he knows all other attributes exactly! • What goes wrong? • The assumption that “distance is everything” • No isolation  no privacy breach, even if the adversary knows a lot of information Shuchi Chawla

  10. Utility goals • Desirable results • Macroscopic properties (e.g. means) should be preserved • Running statistical tests / data-analysis algorithms should return results similar to those obtained from real data • We show: • Concrete point-wise results on histograms and clustering algorithms Shuchi Chawla

  11. This talk… • A formalism for privacy • What we mean by privacy • A good sanitization procedure • Results • Histograms and Perturbations • Subsequent work; Open problems Shuchi Chawla

  12. 1 1 1 1 1 2 2 1 1 2 2 1 1 Two techniques for sanitization • Recursive histograms • Assume: the universe is a d-dimensional hypercube [-1,1]n • As long as a cell contains  T points: • Subdivide it into 2d hypercubes by splitting each side evenly • Recurse until all cells have T points • Output a list of cells and counts d=2, T=3 Shuchi Chawla

  13. Two techniques for sanitization • Recursive histograms • Perturbation • For every point x, compute its T-radius tx: |B(x, tx)| = T • Add random vector to x of length proportional to tx – doesn’t work by itself T=1 Shuchi Chawla

  14. 1 1 1 1 1 2 2 1 1 2 2 1 1 Two techniques for sanitization • Recursive histograms • Perturbation – combined with histograms • Results on privacy • Rely on randomness in distribution and sanitization • Do not use any computational assumptions • When D = uniform over a hypercube, c = O(1), T = arbitrary probability of success for the adversary:   2-d • Better results for special cases Shuchi Chawla

  15. Key results on utility • Perturbation-based sanitization: Allows for various clustering algorithms to perform nearly as well as on real data • Spectral techniques • Diameter-based clusterings • Histograms : a popular summarization technique in statistics • Recursive histograms – benefit of providing more detail where required • Provide density information even without the counts • No randomness involved! Shuchi Chawla

  16. 1 1 1 1 1 2 2 1 1 2 2 1 1 A brief proof of privacy • Recall recursive histograms • Simplifying assumption: Input distribution is uniform over the hypercube • Intuition • The adversary’s view – a product of uniform distributions over histogram cells • The uniform distribution is “well-spread-out” – the adversary cannot conclusively single out a point in it Shuchi Chawla

  17. A brief proof of privacy • Case 1: Sparse cell • Expected distance ||q - x|| proportional to diameter of cell • c times this distance is larger than diameter of parent cell • Therefore, B(q,c) contains at least T points • Case 2: Dense cell • Consider the balls B(q,r) and B(q,cr) for some radius r • The adversary wins if Pr[  x  B(q,r) ] is large, and, Pr[  T points in B(q,cr) ] is small • However, we show: Pr[  x  B(q,cr) ] >> Pr[  x  B(q,r) ] q x Shuchi Chawla

  18. A brief proof of privacy • Lemma: Let c be a large enough constant. For any cell and any r < diam(cell)/c , Pr[  x  B(q,cr) cell ]  2d Pr[  x  B(q,r) cell ] • Proof idea: • Pr[  x  B(q,r) cell ]  Vol( B(q,r) cell ) • Vol( B(q,cr) cell ) > 2d Vol( B(q,r) cell ) Uses arguments about Normal and Uniform random variables • Corollary: Prob. of success for the adversary < 2-d B(q,cr) B(q,r) cell Shuchi Chawla

  19. This talk… • A formalism for privacy • What we mean by privacy • A good sanitization procedure • Results • Histograms and Perturbations • Subsequent work; Open problems Shuchi Chawla

  20. Follow-up work • Isolation in few dimensions • Adversary must be more and more accurate in fewer dimensions • Randomized recursive histograms [Chawla, Dwork, McSherry, Talwar] • Similar privacy guarantees for “nearly-uniform” distributions over “well-rounded” universes • Preserve distances between pairs of points to a reasonable accuracy (additive error depending on T) • General-case impossibility • Cannot allow arbitrary AUX –  utility, and  definitions of privacy,  AUX that prevents privacy-preserving sanitization Shuchi Chawla

  21. What about the real world? • Lessons from the abstract model • High dimensionality is our friend • Histograms are powerful; Spherical perturbations promising • Need to scale different attributes appropriately, so that data is well-rounded • Moving towards real data • Outliers • Our notion of c-isolation deals with them; existence may be disclosed • Discrete attributes • Possible solution: Convert them into real-valued attributes by adding noise? • The low-dimensional case • Is it inherently impossible? • Dinur and Nissim show impossibility for 1-dimensional data Shuchi Chawla

  22. Questions?

More Related