1 / 28

Post-processing outputs for better utility

Post-processing outputs for better utility. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Announcement. Project proposal submission deadline is Fri, Oct 12 noon. Recap: Differential Privacy. For every pair of inputs that differ in one value. For every output …. D 1. D 2. O.

leland
Download Presentation

Post-processing outputs for better utility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Post-processing outputs for better utility CompSci 590.03Instructor: AshwinMachanavajjhala Lecture 10 : 590.03 Fall 12

  2. Announcement • Project proposal submission deadline is Fri, Oct 12 noon. Lecture 10 : 590.03 Fall 12

  3. Recap: Differential Privacy For every pair of inputs that differ in one value For every output … D1 D2 O Adversary should not be able to distinguish between any D1 and D2 based on any O Pr[A(D1) = O] Pr[A(D2) = O] . log < ε (ε>0) Lecture 10 : 590.03 Fall 12

  4. Recap: Laplacian Distribution Database Query q True answer q(d) q(d) + η Researcher Privacy depends on the λ parameter η h(η) α exp(-η / λ) Mean: 0, Variance: 2 λ2 Lecture 10 : 590.03 Fall 12

  5. Recap: Laplace Mechanism Thm: If sensitivity of the query is S, then the following guarantees ε-differential privacy. λ = S/ε Sensitivity: Smallest number s.t. for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q) Histogram query: Sensitivity = 2 • Variance / error on each entry = 2x4/ε2 = O(1/ε2) Lecture 10 : 590.03 Fall 12

  6. This class • What is the optimal method to answer a batch of queries? Lecture 10 : 590.03 Fall 12

  7. How to answer a batch of queries? • Database of values {x1, x2, …, xk} • Query Set: • Value of x1 η1 = x1 + δ1 • Value of x2 η2 = x2 + δ2 • Value of x1 + x2 η3 = x1 + x2 + δ3 • But we know that η1 and η2 should sum up to η3! Lecture 10 : 590.03 Fall 12

  8. Two Approaches • Constrained inference • Ensure that the returned answers are consistent with each other. • Query Strategy • Answer a different set of strategy queries A • Answer original queries using A • Universal Histograms • Wavelet Mechanism • Matrix Mechanism Lecture 10 : 590.03 Fall 12

  9. Two Approaches • Constrained inference • Ensure that the returned answers are consistent with each other. • Query Strategy • Answer a different set of strategy queries A • Answer original queries using A • Universal Histograms • Wavelet Mechanism • Matrix Mechanism Lecture 10 : 590.03 Fall 12

  10. Constrained Inference Lecture 10 : 590.03 Fall 12

  11. Constrained Inference • Let x1 and x2 be the original values. We observe noisy values η1, η2 and η3 • We would like to reconstruct the best estimators y1 (for x1/) and y2 (for x2) from the noisy values. • That is, we want to find the values of y1, y2 such that: min (y1-η1)2 + (y2 – η2)2 + (y3 – η3)2s.t., y1 + y2 = y3 Lecture 10 : 590.03 Fall 12

  12. Constrained Inference [Hay et al VLDB 10] Lecture 10 : 590.03 Fall 12

  13. Sorted Unattributed Histograms • Counts of diseases • (without associating a particular count to the corresponding disease) • Degree sequence: List of node degrees • (without associating a degree to a particular node) • Constraint: The values are sorted Lecture 10 : 590.03 Fall 12

  14. Sorted Unattributed Histograms True Values 20, 10, 8, 8, 8, 5, 3, 2 Noisy Values 25, 9, 13, 7, 10, 6, 3, 1 (noise from Lap(1/ε)) Proof:? Lecture 10 : 590.03 Fall 12

  15. Sorted Unattributed Histograms Lecture 10 : 590.03 Fall 12

  16. Sorted Unattributed Histograms • n: number of values in the histogram • d: number of distinct values in the histogram • ni: number of times ith distinct value appears in the histogram. Lecture 10 : 590.03 Fall 12

  17. Two Approaches • Constrained inference • Ensure that the returned answers are consistent with each other. • Query Strategy • Answer a different set of strategy queries A • Answer original queries using A • Universal Histograms • Wavelet Mechanism • Matrix Mechanism Lecture 10 : 590.03 Fall 12

  18. Query Strategy Strategy Query Workload Original Query Workload A W I Differential Privacy ~ ~ A(I) A(I) W(I) Noisy Strategy Answers Noisy Workload Answers Private Data Lecture 10 : 590.03 Fall 12

  19. Range Queries • Given a set of values {x1, x2, …, xn} • Range query: q(j,k) = xj + … + xk Q: Suppose we want to answer all range queries? Strategy 1: Answer all range queries using Laplace mechanism • O(n2/ε2) total error. • May reduce using constrained optimization … Lecture 10 : 590.03 Fall 12

  20. Range Queries • Given a set of values {x1, x2, …, xn} • Range query: q(j,k) = xj + … + xk Q: Suppose we want to answer all range queries? Strategy 1: Answer all range queries using Laplace mechanism • Sensitivity = O(n2) • O(n4/ε2) total error across all range queries. • May reduce using constrained optimization … Lecture 10 : 590.03 Fall 12

  21. Range Queries • Given a set of values {x1, x2, …, xn} • Range query: q(j,k) = xj + … + xk Q: Suppose we want to answer all range queries? Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values. • O(1/ε2) error for each xi. • Error(q(1,n)) = O(n/ε2) • Total error on all range queries : O(n3/ε2) Lecture 10 : 590.03 Fall 12

  22. Universal Histograms for Range Queries [Hay et al VLDB 2010] Strategy 3: Answer sufficient statistics using Laplace mechanismAnswer range queries using noisy sufficient statistics. x1-8 x1234 x5678 x12 x34 x56 x78 x1 x2 x3 x4 x5 x6 x7 x8 Lecture 10 : 590.03 Fall 12

  23. Universal Histograms for Range Queries • Sensitivity: log n • q(2,6) = x2+x3+x4+x5+x6 Error = 2 x 5log2n/ε2 = x2 + x34 + x56 Error = 2 x 3log2n/ε2 x1-8 x1234 x5678 x12 x34 x56 x78 x1 x2 x3 x4 x5 x6 x7 x8 Lecture 10 : 590.03 Fall 12

  24. Universal Histograms for Range Queries • Every range query can be answered by summing at most log n different noisy answers • Maximum error on any range query = O(log3n / ε2) • Total error on all range queries = O(n2 log3n / ε2) x1-8 x1234 x5678 x12 x34 x56 x78 x1 x2 x3 x4 x5 x6 x7 x8 Lecture 10 : 590.03 Fall 12

  25. Universal Histograms & Constrained Inference [Hay et al VLDB 2010] • Can further reduce the error by enforcing constraintsx1234 = x12 + x34 = x1 + x2 + x3 + x4 • 2-pass algorithm to compute a consistent version of the counts Lecture 10 : 590.03 Fall 12

  26. Universal Histograms & Constrained Inference [Hay et al VLDB 2010] • Pass 1: (Bottom Up) • Pass 2: (Top down) Lecture 10 : 590.03 Fall 12

  27. Universal Histograms & Constrained Inference • Resulting consistent counts • Have lower error than noisy counts (upto 10 times smaller in some cases) • Unbiased estimators • Have the least error amongst all unbiased estimators Lecture 10 : 590.03 Fall 12

  28. Next Class • Constrained inference • Ensure that the returned answers are consistent with each other. • Query Strategy • Answer a different set of strategy queries A • Answer original queries using A • Universal Histograms • Wavelet Mechanism • Matrix Mechanism Lecture 10 : 590.03 Fall 12

More Related