1 / 49

Foundations of Privacy Lecture 8

Foundations of Privacy Lecture 8. Lecturer: Moni Naor. Recap of last week’s lecture. Counting Queries Online algorithm with good accuracy Started changing data. What if the data is dynamic?. Want to handle situations where the data keeps changing

fuller
Download Presentation

Foundations of Privacy Lecture 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foundations of PrivacyLecture 8 Lecturer:Moni Naor

  2. Recap of last week’s lecture • Counting Queries • Online algorithm with good accuracy • Started changing data

  3. What if the data is dynamic? • Want to handle situations where the data keeps changing • Not all data is available at the time of sanitization Curator/ Sanitizer

  4. Google Flu Trends “We've found that certain search terms are good indicators of flu activity. Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time.”

  5. Example of Utility: Google Flu Trends

  6. What if the data is dynamic? • Want to handle situations where the data keeps changing • Not all data is available at the time of sanitization Issues • When does the algorithm make an output? • What does the adversary get to examine? • How do we define an individual which we should protect? D+Me • Efficiency measures of the sanitizer

  7. DataStreams state • Data is a stream of items • Sanitizer sees each item and updates internal state. • Produces output: either on-the-fly or at the end output Sanitizer Data Stream

  8. Three new issues/concepts • Continual Observation • The adversary gets to examine the output of the sanitizer all the time • Pan Privacy • The adversary gets to examine the internal state of the sanitizer. Once? Several times? All the time? • “User” vs. “Event” Level Protection • Are the items “singletons” or are they related

  9. Randomized Response • Randomized Response Technique [Warner 1965] • Method for polling stigmatizing questions • Idea: Lie with known probability. • Specific answers are deniable • Aggregate results are still valid • The data is never stored “in the plain” “trust no-one” Popular in DB literature Mishra and Sandler. 1 0 1 + + + noise noise noise …

  10. The Dynamic Privacy Zoo Petting User-Level Continual Observation Pan Private Differentially Private Continual Observation Pan Private Randomized Response User level Private

  11. Continual Output Observation state • Data is a stream of items • Sanitizer sees each item, updates internal state. • Produces an output observable to the adversary Output Sanitizer

  12. Continual Observation • Alg - algorithm working on a stream of data • Mapping prefixes of data streams to outputs • Step i output i • Alg isε-differentially private against continual observation if for all • adjacent data streams S and S’ • for all prefixes t outputs 1 2 … t Adjacent data streams: can get from one to the other by changing one element S= acgtbxcde S’= acgtbycde Pr[Alg(S)=1 2 … t] ≤ eε≈1+ε e-ε≤ Pr[Alg(S’)=1 2 … t]

  13. The Counter Problem 0/1 input stream 011001000100000011000000100101 Goal : a publicly observable counter, approximating the total number of 1’s so far Continual output:each time period, output total number of 1’s Want to hide individual increments while providing reasonable accuracy

  14. Counters w. Continual Output Observation state • Data is a stream of 0/1 • Sanitizer sees eachxi, updates internal state. • Produces a valueobservable to the adversary 1 1 1 2 Output Sanitizer 1 0 0 1 0 0 1 1 0 0 0 1

  15. Counters w. Continual Output Observation • Continual output:each time period, output total 1’s • Initial idea: at each time period, on input xi2 {0,1} • Update counter by input xi • Add independent Laplace noise with magnitude 1/ε • Privacy: since each increment protected by Laplace noise – differentially private whether xiis 0 or 1 • Accuracy: noise cancels out, error Õ(√T) • For sparse streams: this error too high. T – total number of time periods 0 -4 -3 -2 -1 1 2 3 4 5

  16. Why So Inaccurate? • Operate essentially as in randomized response • No utilization of the state • Problem: we do the same operations when the stream is sparse as when it is dense • Want to act differently when the stream is dense • The times where the counter is updated are potential leakage

  17. DelayedUpdates Main idea:update output value only when large gap between actual count and output Have a good way of outputting value of counter once: the actual counter + noise. Maintain Actual countAt (+ noise) Current outputoutt(+ noise) D – update threshold

  18. Delayed Output Counter delay Outt - current output At - count since last update. Dt - noisy threshold If At – Dt > fresh noise then Outt+1  Outt + At + fresh noise At+1  0 Dt+1  D + fresh noise Noise: independent Laplace noise with magnitude 1/ε Accuracy: • For threshold D: w.h.p update about N/D times • Total error: (N/D)1/2noise + D + noise + noise • Set D = N1/3  accuracy ~ N1/3

  19. Privacy of Delayed Output At – Dt > fresh noise, Dt+1  D + fresh noise Outt+1Outt +At+ fresh noise Protect: update time and update value For any two adjacent sequences 101101110001 101101010001 Can pair up noise vectors 12k-1k k+1 12k-1’k k+1 Identical in all locations except one ’k = k +1 Where first update after difference occurred Dt D’t Prob≈eε

  20. Dynamic from Static Accumulator measured when stream is in the time frame • Run many accumulators in parallel: • each accumulator: counts number of 1's in a fixed segment of time plus noise. • Value of the output counter at any point in time: sum of the accumulators of few segments • Accuracy: depends on number of segments in summation and the accuracy of accumulators • Privacy: depends on the number of accumulators that a point influences Idea: apply conversion of static algorithms into dynamic ones Bentley-Saxe 1980 Only finished segments used xt

  21. The Segment Construction Based on the bit representation: Each pointt is in dlogtesegments i=1t xi- Sum of at most log t accumulators By setting ’ ¼  / log T can get the desired privacy Accuracy: With all but negligible in Tprobability the error at every stept is at most O((log1.5T)/)). canceling

  22. Synthetic Counter Can make the counter synthetic • Monotone • Each round counter goes up by at most 1 Apply to any monotone function

  23. Lower Bound on Accuracy Theorem: additive inaccuracy of log T is essential for -differential privacy, even for =1 • Consider: the stream 0T compared to collection of T/b streams of the form 0jb1b0T-(j+1)b Sj = 000000001111000000000000 b Call output sequence correct: if a b/3approximation for all points in time

  24. …Lower Bound on Accuracy Sj=000000001111000000000000 Important properties • For any output: ratio of probabilities under stream Sj and 0T should be at least e-b • Hybrid argument from differential privacy • Any output sequence correct for at most one Sj or 0T • Say probability of a good output sequence is at least  b/3approximation for all points in time Good for Sj Prob under 0T: at least e-b b=1/2log T,  =1/2 T/b  e-b· 1- contradiction

  25. Hybrid Proof Want to show that for any event B Pr[A(0T)2 B] Let Sji=0jb1i0T-jb-i Sj0=0T Sjb=Sj e-εb≤ Pr[A(Sj) 2 B] Pr[A(Sji) 2 B] e-ε≤ Pr[A(Sji+1)2B] Pr[A(Sj0)2B] Pr[A(Sj0)2B] Pr[A(Sjb-1)2B] . . … = ¸ e-εb Pr[A(Sjb)2B] Pr[A(Sj1)2B] Pr[A(Sjb)2B]

  26. What shall we do with the counter? Privacy-preserving counting is a basic building block in more complex environments General characterizations and transformationsEvent-level pan-private continual-output algorithm for any low sensitivity function Following expert advice privatelyTrack experts over time, choose who to followNeed to track how many times each expert was correct

  27. Following Expert Advice Hannan 1957Littlestone Warmuth 1989 Expert 1 1 1 1 0 1 Expert 2 0 1 1 0 0 Expert 3 0 0 1 1 1 0 1 1 0 0 Correct n experts, in every time period each gives 0/1 advice • pick which expert to follow • then learn correct answer, say in 0/1 Goal:over time, competitive with bestexpert in hindsight

  28. Following Expert Advice Goal:#mistakes of chosen experts ≈#mistakes made by best expert in hindsight Want 1+o(1)approximation n experts, in every time period each gives 0/1 advice pick which expert to follow then learn correct answer, say in 0/1 Goal: over time, competitive with bestexpert in hindsight Expert 1 1 1 1 0 1 Expert 2 0 1 1 0 0 Expert 3 0 0 1 1 1 0 1 1 0 0 Correct

  29. Following Expert Advice, Privately Was the expert consulted at all? n experts, in every time period each gives 0/1 advice • pick which expert to follow • then learn correct answer, say in 0/1 Goal:over time, competitive with bestexpert in hindsight New concern: protect privacy of experts’ opinions and outcomes User-level privacyLower bound, no non-trivial algorithm Event-level privacy counting gives 1+o(1)-competitive

  30. Algorithm for Following Expert Advice Follow perturbed leader [Kalai Vempala]For each expert: keep perturbed # of mistakesfollow expert with lowest perturbed count Idea: use counter, count privacy-preserving #mistakes Problem: not every perturbation worksneed counter with well-behaved noise distribution Theorem [Follow the Privacy-Perturbed Leader]For n experts, over T time periods, # mistakes is within ≈ poly(log n,log T,1/ε) of best expert

  31. List Update Problem There are n distinct elements A={a1, a2, … an} Have to maintain them in a list – some permutation • Given a request sequence: r1, r2, … • Each ri2 A • For request ri: cost is how far riis in the current permutation • Can rearrange list between requests • Want to minimize total cost for request sequence • Sequence not known in advance for each request ri: cannot tell whether riis in the sequence or not Our goal: do it while providing privacy for the request sequence, assuming list order is public

  32. List Update Problem In general: cost can be very high First problem to be analyzed in the competitive framework by Sleator and Tarjan (1985) Compared to the best algorithm that knows the sequence in advance Best algorithms: 2- competitive deterministic Better randomized ~ 1.5 Assume free rearrangements between request Bad news: cannot be better than (1/)-competitive if we want to keep privacy Cannot act until 1/ requests to an element appear

  33. Lower bound for Deterministic Algorithms • Bad schedule: always ask for the last element in the list • Cost of online: n¢t • Cost of best fixed list: sort the list according to popularity • Average cost: · 1/2n • Total cost: · 1/2n¢t

  34. List Update Problem: Static Optimality A more modest performance goal: compete with the best algorithm that fixes the permutation in advance Blum-Chowla-Kalai: can be 1+o(1)competitive wrt best static algorithm (probabilistic) BCK algorithm based on number of times each element has been requested. Algorithm: • Start with random weights ri in range [1,c] • At all times wi = ri + ci • ci is # of times element ai was requested. • At any point in time: arrange elements according to weights

  35. Privacy with Static Optimality Algorithm: • Start with random weights ri in range [1,c] • At any point in time wi = ri + ci • ci is # of times element ai was requested. • Arrange elements according to weights • Privacy: from privacy of counters • list depends on counters plus randomness • Accuracy: can show that BCK proof can be modified to handle approximate counts as well • What about efficiency? Run with private counter

  36. The multi-counter problem How to run n counters for T time steps • In each round: few counters are incremented • Identity of incremented counter is kept private • Work per increment: logarithmic in n and T • Idea: arrange the n counters in a binary tree with n leaves • Output counters associated with leaves • For each internal node: maintain a counter corresponding to sum of leaves in subtree

  37. The multi-counter problem • Idea: arrange the n counters in a binary tree with n leaves • Output counters associated with leaves • For each internal node maintain: • Counter corresponding to sum of leaves in subtree • Register with number of increments since last output update • When a leaf counter is updated: • All log n nodes to root are incremented • Internal state of root updated. • If output of parent node updated, internal state of children updated (internal, output) Determines when to update subtree

  38. Tree of Counters (counter, register) Output counter

  39. The multi-counter problem • Work per increment: • log n increment + number of counter need to update • Amortized complexity is O(n log n /k) • k number of times we expect to increment a counter until output is updated • Privacy: each increment of a leaf counter effects log n counters • Accuracy: we have introduced some delay: • After t ¸ k log n increments all nodes on path have been update

  40. Pan-Privacy “think of the children” In privacy literature: data curator trusted In reality: even well-intentioned curator subject to mission creep, subpoena, security breach… • Pro baseball anonymous drug tests • Facebook policies to protect users from application developers • Google accounts hacked Goal: curator accumulates statistical information,but never stores sensitive data about individuals Pan-privacy:algorithm private inside and out • internal state is privacy-preserving.

  41. Randomized Response [Warner 1965] Strong guarantee: no trust in curator Makes sense when each user’s data appears only once,otherwise limited utility New idea: curator aggregates statistical information,but never stores sensitive data about individuals User Response noise noise noise … + + + Method for polling stigmatizing questions Idea: participants lie with known probability. • Specific answers are deniable • Aggregate results are still valid Data never stored “in the clear”popular in DB literature [MiSa06] 1 0 1 User Data

  42. Aggregation Without Storing Sensitive Data? Streaming algorithms: small storage Information stored can still be sensitive “My data”: many appearances, arbitrarily interleaved with those of others Pan-Private Algorithm Private “inside and out” Even internal state completely hides the appearance pattern of any individual:presence, absence, frequency, etc. “User level”

  43. Pan-Privacy Model output state Data is stream of items, each item belongs to a user Data of different users interleaved arbitrarily Curator sees items, updates internal state, output at stream end Can also consider multiple intrusions Pan-PrivacyFor every possible behavior of user in stream, joint distribution of the internal state at any single point in time and the final output is differentially private

  44. Adjacency: User Level Universe U of users whose data in the stream; x2U • Streams x-adjacentif same projections of users onto U\{x} Example: axbxcxdxxxex and abcdxeare x-adjacent • Both project to abcde • Notion of “corresponding locations” in x-adjacent streams • U -adjacent: 9x 2U for which they are x-adjacent • Simply “adjacent,” if U is understood Note: Streams of different lengths can be adjacent

  45. Example: Stream Density or # Distinct Elements Universe U of users, estimate how many distinct users in U appear in data stream Application: # distinct users who searched for “flu” Ideas that don’t work: • NaïveKeep list of users that appeared (bad privacy and space) • Streaming • Track random sub-sample of users (bad privacy) • Hash each user, track minimal hash (bad privacy)

  46. Pan-Private Density Estimator Inspired by randomized response. Store for each user x 2 Ua single bit bx Initially all bx0w.p.½1w.p. ½ When encounteringx redraw bx0w.p. ½-ε1w.p. ½+ε Final output:[(fraction of 1’s in table - ½)/ε] + noise Distribution D0 DistributionD1 Pan-PrivacyIf user never appeared: entry drawn from D0If user appeared any # of times: entry drawn fromD1D0 and D1 are 4ε-differentially private

  47. Pan-Private Density Estimator Inspired by randomized response. Store for each user x 2 Ua single bit bx Initially all bx0w.p.½1w.p. ½ When encounteringx redraw bx0w.p. ½-ε1w.p. ½+ε Final output:[(fraction of 1’s in table - ½)/ε] + noise Distribution D0 DistributionD1 Improved accuracy and Storage Multiplicative accuracy using hashing Small storage using sub-sampling

  48. Pan-Private Density Estimator Theorem [density estimation streaming algorithm] εpan-privacy, multiplicative error α space is poly(1/α,1/ε)

  49. Petting The Dynamic Privacy Zoo Continual Pan Privacy Differentially Private Outputs Privacy under Continual Observation Pan Privacy Sketch vs. Stream User level Privacy

More Related