1 / 27

Privacy without Noise

Privacy without Noise. Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009. The Problem. Given a database d , consisted of records about individual users, wish to release some statistical information f ( d ) without compromising individual’s privacy. Our Results.

graham
Download Presentation

Privacy without Noise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

  2. The Problem • Given a database d, consisted of records about individual users, wish to release some statistical information f(d) without compromising individual’s privacy

  3. Our Results • Main stream approach relies on additive noise. We show that this alone is neither sufficient, nor, for some type of queries, necessary for privacy • The inherent uncertainty associated with unknown quantities is enough to provide the same privacy without external noise • Provide the first mathematical proof, and conditions, for the widely accepted heuristic that aggregates are private

  4. Preliminaries • A database is , D is an arbitrary domain • di is drawn i.i.d. from a public distribution • Hamming distance H(d, d') between two databases d, d' = the number of entries on which they differ • Query: g(di)=[g1(di),…,gm(di)]T, gj(di): D[0, 1]

  5. The Power of Addition • A large number of popular algorithms can be run with addition-only steps • Linear algorithms: voting and summation, nonlinear algorithm: regression, classification, SVD, PCA, k-means, ID3, EM etc • All algorithms in the statistical query model • Many other gradient-based numerical algorithms • Addition-only framework has very efficient private implementation in cryptography and admits efficient zero-knowledge proofs (ZKPs)

  6. Notions of Privacy • But what do we mean by privacy? • I don’t know how much you weigh but I can find out its highest digit is 2 • Or, I don’t know whether you drink or not but I can find that drinking people are happier • The definition must meet people’s expectation • And allow for rigorous mathematical reasoning

  7. Differential Privacy The risk to my privacy should not substantially increase as a result of participating in a statistical database:

  8. A gives -differential privacy if for all values of DB and Me and all transcripts t: Differential Privacy Pr [t]

  9. No perceptible risk is incurred by joining DB. Any info adversary can obtain, it could obtain without Me (my data). Pr [t] Differential Privacy

  10. Differential Privacy w/ Additive Noise Noise Response f(d) Σ Noise must be: (1) independently generated for each query; (2) has sufficiently large variance. Can be Laplace, Gaussian, Binomial But … The variance of independent noise can be reduced via averaging. Fix: Restrict the total number of queries, i.e., the dimensionality of f,(to m)

  11. But It Is Not effective dj mqueries mqueries 2mqueries If a user profile is shared among multiple databases, one could get more queries about the user than differential privacy allows

  12. And It Is Not Necessary Either • There is another source of randomness that could provide similar protection as external noise – the data itself • Some functions are insensitive to small perturbation to the input

  13. Aggregates of n Random Variables • Probability theory has many established results on the asymptotic behavior of aggregates of n random variables • Under certain conditions, when n is sufficiently large, the aggregates converge in some way to a distribution independent of the individual samples except for a few distributional parameters.

  14. Central Limit Theorem

  15. Differential Privacy: An individual’s Perspective • Privacy is defined in terms of perturbation to • individual data record • Existing solutions achieve this via external noise • Each element is independently perturbed

  16. Sum Queries • With sum queries, when n is large, for each k, the quantity converges in distribution to gaussian (CLT) • Since for every k, can Δkprovide similar protection? • Compared against Lemma 1, the difference is that the perturbations to each element of g(dk) are not independent

  17. Privacy without Noise x2 x2 σ σ g(dk) g(dk) x1 x1 • Independent and (b) non-independent gaussian perturbations • in 2-dimensional case. (b) has variance σ2 along its minor axis. • Note how the perturbation in (b) “envelops” that in (a).

  18. Main Result where is the smallest eigenvalue of V

  19. A Simple Necessary Condition • Suppose we have answered k queries which are all deemed safe • For the (k+1)-th query to be safe, the condition is • Adding a new row is

  20. A Simple Necessary Condition • We know σk+1( ) = 0 • xk+1must be “large” enough to perturb the singular value away from 0 by sufficient amount. Using matrix perturbation theory (Weyl theorem), we have

  21. Query Auditing • Instead of perturbing the responses, query auditing restricts the queries that can cause privacy breach • Must be careful with denials q q(d) or DENY

  22. Simulatability • Key idea: if the adversary can simulate the output of the auditor using only public information, then nothing more is leaked • Denials: if the decision to deny or grant query answers is based on information that can be approximated by the adversary, then the decision itself does not reveal more info

  23. Simulatable Query Auditing • Previous schemes achieve simulatablity by not using the data • Using our condition to verify privacy in online query auditing is simulatable • Even though the data is used in the decision making process, the information is still simulatable

  24. Simulatable Query Auditing The auditor: The simulator:

  25. Simulatable Query Auditing • Using law of large numbers, and Weyl’s theorem (again!), we can prove that when n is large, for any

  26. Issue of Shared Records • We are not totally immune to this vulnerability, but our privacy condition is actually stronger than simply restricting the number of queries, even though we do not add noise • An adversary gets less information about individual records from the same number of queries

  27. More info: duan@rd.netease.com Full version of the paper: http://bid.berkeley.edu/projects/p4p/papers/pwn-full.pdf

More Related