1 / 20

When Random Sampling Preserves Privacy

When Random Sampling Preserves Privacy. Kamalika Chaudhuri U.C.Berkeley . Nina Mishra U.Virginia . The Problem. Sanitizer. Sanitized Database. Database. Setting: Table : Set of rows Sanitizer: Releases each row with probability p

palma
Download Presentation

When Random Sampling Preserves Privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. When Random Sampling Preserves Privacy Kamalika Chaudhuri U.C.Berkeley Nina Mishra U.Virginia

  2. The Problem Sanitizer Sanitized Database Database • Setting: • Table : Set of rows • Sanitizer: Releases each row with probability p • What are the conditions under which this sanitizer preserves privacy?

  3. Search Data • AOL released user search data: • Replaced usernames with random ids

  4. Search Data Kamalika Cynthia Nina “Berkeley restaurants” “Low degree spanning trees” “Tickets to India” “Privacy sampling” “Airfare Santa Barbara” “Traffic on 101N” “Restaurants Mountain View” “Rank Aggregation” “Memory bound functions” “Crypto registration” “Falafel Charlottesville” “Query Auditing” “Clustering streaming” “Tickets to SFO” “Privacy sampling”

  5. U.S. Census Data • Random sample of preprocessed data: • Removing unique values • Merging cells with less than a threshold number of individuals

  6. Privacy Definition [DMNS06,…] • -Indistinguishability • Two tables T, T’, differ by a single row • S : Output of the sanitizer • Pr[S | T] ≤ (1 + ) Pr[S | T’] S T T’

  7. An Example • Cannot always get -Indistinguishability with random sampling • T : n rows with value 0 • T’ : n-1 rows with value 0, 1 row with value 1 • S : 1 row with value 1, s – 1 rows with value 0 S T T’

  8. Privacy Definition[DKMMiNa06,BDMN05] • (,)-Indistinguishability : • Two tables T, T’, differ by a single row • S : Output of the sanitizer • With probability at least 1 - , • Pr[S | T] ≤ (1 + ) Pr[S | T’] S T T’

  9. An Example • Cannot always get (,)-Indistinguishability for all tables • A table where all rows have unique values S T T’

  10. When does Random Sampling preserve Privacy? • Parameters: • (, )-indistinguishability • k : number of distinct values in T • t : number of values which occur at most log(k/)/ times in T • Theorem: This can be guaranteed if • p <  (if t = 0) • p < Õ( /t)

  11. log(k/)/ log(k/)/p Number of rows with value v Classification of Values For (, )-indistinguishability: Rare Value Infrequent Value Common Value

  12. Rare Values • If a rare value v is observed in a random sample, • Pr[S|T’]>(1 + /log(k/d)) Pr[S|T] S T T’

  13. Rare Infrequent Common log(k/)/  log(k/)/p Common Values • For a common value v, • Pr[S|T] ≈ Pr[S|T’] • Typically, the number of rows with a common value is close to its expectation S T T’

  14. Rare Infrequent Common log(k/)/  log(k/)/p Infrequent Values • For an infrequent value v, • Pr[S|T] ≈ Pr[S|T’] • Typically, the number of rows with an infrequent value is at most log(k/) away from its expected value S T T’

  15. Properties of a Good Sample • A sample S is -indistinguishable if: • No rare values • The number of rows with common value v is within a constant factor of expectation • The number of rows with infrequent value v is at most an additive O(log(k/)) more than its expected value

  16. When does Random Sampling preserve Privacy? • Such a sample occurs with probability at least 1 -  if • p <  (if t=0) • p < Õ( /t)

  17. Utility of Random Sampling • Assuming no rare values: • Error in the frequency of each value : additive 1/√n • [DMNS06] Estimates histogram with an additive error of 1/n in each frequency • Sampling may give a compact representation of the histogram

  18. Conclusions • Random sampling preserves privacy only when there are few rare values • With rare values, the probability of failure can be high •  = (1/n) as opposed to 1/2^n [DKMMiNa06, BDMN05] • Error in estimating the frequency of each value can be high • Additive 1/√n as opposed to 1/n of [DMNS06]

  19. Thank You

  20. The Problem • What are the conditions under which this sanitizer preserves privacy?

More Related