Foundations of privacy lecture 4
Download
1 / 31

Foundations of Privacy Lecture 4 - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Foundations of Privacy Lecture 4. Lecturer: Moni Naor. Recap of last week’s lecture. Differential Privacy Sensitivity: Global sensitivity of query q:U n → R d GS q = max D,D’ ||q(D) – q(D’)|| 1 Local sensitivity of query q at point D LS q (D)= max D’ |q(D) – q(D’)|

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Foundations of Privacy Lecture 4' - aldon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Foundations of privacy lecture 4

Foundations of PrivacyLecture 4

Lecturer:Moni Naor


Recap of last week s lecture
Recap of last week’s lecture

  • Differential Privacy

  • Sensitivity:

    • Global sensitivity of query q:Un→Rd

      GSq = maxD,D’ ||q(D) – q(D’)||1

    • Local sensitivity of query q at point D

      LSq(D)= maxD’ |q(D) – q(D’)|

    • Smooth sensitivity

      Sf*(X)= maxY {LSf(Y)e- dist(x,y)}

  • Histograms

  • Differential privacy of median

  • Exponential Mechanism


Histograms
Histograms

Inputs x1, x2, ..., xnin domain U Domain U partitioned into d disjoint bins S1,…,Sdq(x1, x2, ..., xn) = (n1, n2, ..., nd) where

nj = #{i : xi in j-th bin}

Can view as d queries: qi counts # spoints in set Si

For adjacent D,D’, only one answer can change - it can change by 1

Global sensitivity of answer vector is 1

Sufficient to add Lap(1/ε) noise to eachquery, still get ε-privacy


The exponential mechanism mcsherry talwar
The Exponential Mechanism [McSherry Talwar]

A general mechanism that yields

  • Differential privacy

  • May yield utility/approximation

  • Is defined and evaluated by considering all possible answers

    The definition does not yield an efficient way of evaluating it

    Application/original motivation:

    Approximate truthfulness of auctions

  • Collusion resistance

  • Compatibility


Side bar digital goods auction
Side bar: Digital Goods Auction

  • Some product with 0 cost of production

  • n individuals with valuation v1, v2, … vn

  • Auctioneer wants to maximize profit


Example of the exponential mechanism
Example of the Exponential Mechanism

  • Data: xi= website visited by student i today

  • Range: Y = {website names}

  • For each name y, let q(y, X) = #{i : xi = y}

    Goal: output the most frequently visited site

  • Procedure: Given X, Output website ywith probability prop to eq(y,X)

  • Popular sites exponentially more likely than rare ones

    Website scores don’t change too quickly

Size of subset


Setting
Setting

  • For input D 2Unwant to find r2R

  • Base measure  on R - usually uniform

  • Score function q’:Un £R  R

    assigns any pair (D,r) a real value

    • Want to maximize it (approximately)

      The exponential mechanism

    • Assign output r2R with probability proportional to

      eq’(D,r)(r)

      Normalizing factor req’(D,r)(r)


The exponential mechanism is private
The exponential mechanism is private

  • Let  = maxD,D’,r |q(D,r)-q(D’,r)|

    Claim: The exponential mechanism yields a 2¢¢ differentially private solution

    • Prob [output = r on input D]

      = eq’(D,r)(r)/req’(D,r)(r)

    • Prob [output = r on input D’]

      = eq’(D’,r)(r)/req’(D’,r)(r)

adjacent

Ratio is

bounded by

e e


Laplace noise as exponential mechanism
Laplace Noise as Exponential Mechanism

  • On query q:Un→R let q’(D,r) = -|q(D)-r|

  • Prob noise = y

    e-y / 2 ye-y = /2e-y

    Laplace distribution Y=Lap(b) has density function

    Pr[Y=y] =1/2b e-|y|/b

y

0

-4

-3

-2

-1

1

2

3

4

5


Any differentially private mechanism is an instance of the exponential mechanism
Any Differentially Private Mechanism is an instance of the Exponential Mechanism

  • Let M be a differentially private mechanism

    Take q’(D,r) to be logProb[M(D) =r]

    Remaining issue: Accuracy


Private ranking
Private Ranking

  • Each element i 2 {1, … n} has a real valued score SD(i)based on a data set D.

  • Goal: Output k elements with highest scores.

  • Privacy

  • Data set D consists of n entries in domain D.

    • Differential privacy: Protects privacy of entries in D.

  • Condition: Insensitive Scores

    • for any element i, for any data sets D, D’ that differ in one entry:|SD(i)- SD’(i)| · 1


Approximate ranking
Approximate ranking

  • Let Sk be the kth highest score based on data set D.

  • An output list is  -useful if:

    Soundness: No element in the output has score less than Sk - 

    Completeness: Every element with score greater than Sk +  is in the output.

Score·Sk - 

Sk + ·Score

Sk - ·Score·Sk + 


Two approaches
Two Approaches

Each input affects all scores

  • Score perturbation

    • Perturb the scores of the elements with noise

    • Pick the top k elements in terms of noisy scores.

    • Fast and simple implementation

      Question: what sort of noise should be added?

      What sort of guarantees?

  • Exponential sampling

    • Run the exponential mechanism k times.

    • more complicated and slower implementation

      What sort of guarantees?

Homework


Exponential mechanism simple example almost free private lunch
Exponential Mechanism: Simple Example (almost free) private lunch

Database of n individuals, lunch options {1…k},each individual likes or dislikes each option (1 or 0)

Goal: output a lunch option that many like

For each lunch option j2[k], ℓ(j) is # of ind. who like j

Exponential Mechanism:Output j with probability eεℓ(j)

Actual probability: eεℓ(j)/(∑ieεℓ(i))

Normalizer


Synthetic db output is a db
Synthetic DB: Output is a DB lunch

?

answer 1

answer 3

answer 2

Sanitizer

query 1,query 2,. . .

Database

Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB

Software and people compatible

Consistent answers


Answering more queries
Answering More Queries lunch

Using exponential mechanism

Differential Privacy for every set Cof counting queries

Error is Õ(n2/3 log|C|)

Remarkable

Hope for rich private analysis of small DBs!

Quantitative: #queries >> DB size,

Qualitative: output of sanitizer -synthetic DB-output is a DB itself


Counting queries
Counting Queries lunch

DatabaseDof sizen

  • Queries with low sensitivity

    Counting-queries

    Cis a setof predicates c: U  {0,1}

    Query: how many D participants satisfy c ?

    Relaxed accuracy:

    answer query withinαadditive errorw.h.p

    Not so bad:error anyway inherent in statistical analysis

    Assume all queries given in advance

Query c

U

Non-interactive


Utility and privacy can t always be achieved simultaneously
Utility and Privacy Can’t Always Be Achieved Simultaneously

Impossibility results for counting queries:

DB with n participants

can’t have o(√n) error, O(n) queries[DiNi, DwMcTa07,DwYe08]

In all these cases, strong privacy violation

What can we do?

almost entire DB compromised


Huge dbs dwork nissim
Huge DBs [Dwork Nissim] Simultaneously

DB of size n >> # queries |C|:

Add independent noise to answer on every query

Noise per query ~ #queries

For accuracy, need #queries ≤ n

May be reasonable for huge internet-scale DBs,Privacy “for free”


What about smaller dbs
What about smaller DBs? Simultaneously

DB of size n < #queries |C|,

impossibility results:can’t have o(√n) error

Error must be Ω(√n)


The blr algorithm
The BLR Algorithm Simultaneously

For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)|

Intuition: far away DBs get smaller probability

Blum Ligett Roth08

Algorithm on input DB D:

Sample from a distribution on DBs of size m: (m < n) DB F gets picked w.p. /e-ε·dist(F,D)


The blr algorithm1
The BLR Algorithm Simultaneously

Idea:

  • In general: Do not use large DB

    • Sample and answer accordingly

  • DB of size m guaranteeing hitting each query with sufficient accuracy


The blr algorithm 2 privacy
The BLR Algorithm: 2 Simultaneously ε-Privacy

For adjacent D,D’ for every F|dist(F,D) – dist(F,D’)| ≤ 1

Probability ofFby D:e-ε·dist(F,D)/∑G of size m e-ε·dist(G,D)

Probability of F by D’:numerator and denominator can change by eε-factor 2ε-privacy

Algorithm on input DB D:

Sample from a distribution on DBs of size m: (m < n) DB Fgets picked w.p. / e-ε·dist(F,D)


The blr algorithm error n 2 3 log c
The BLR Algorithm: Error Simultaneously Õ(n2/3 log|C|)

There exists Fgood of size m=Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤α

Pr [Fgood] ~ e-εα

For any Fbad with dist2α,Pr [Fbad] ~ e-2εα

Union bound: ∑bad DB FbadPr [Fbad]~ |U|me-2εα

For α=Õ(n2/3log|C|), Pr [Fgood] >> ∑ Pr [Fbad]

Algorithm on input DB D:

Sample from a distribution on DBs of size m: (m < n)DBF gets picked w.p. /e-ε·dist(F,D)


The blr algorithm running time
The BLR Algorithm: Running Time Simultaneously

Generating the distribution by enumeration:Need to enumerate every size-m database,where m= Õ((n\α)2·log|C|)

Running time ≈|U|Õ((n\α)2·log|c|)

Algorithm on input DB D:

Sample from a distribution on DBs of size m: (m < n) DB F gets picked w.p. /e-ε·dist(F,D)


Conclusion
Conclusion Simultaneously

Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries

Error α is Õ(n2/3 log|C|/ε)

Super-poly running time: |U|Õ((n\α)2·log|C|)


Can we efficiently sanitize
Can we Simultaneously Efficiently Sanitize?

The good news

If the universe is small, Can sanitize EFFICIENTLY

The bad news

cannot do much better, namely sanitize in time:sub-poly(|C|) AND sub-poly(|U|)

Timepoly(|C|,|U|)


How efficiently can we sanitize
How Efficiently Can We Sanitize? Simultaneously

|C|

subpoly

poly

|U|

subpoly

?

?

poly

?

?

Good news!


The good news can sanitize when universe is small
The Good News: Can Sanitize When Universe is Small Simultaneously

Efficient Sanitizer for query set C

  • DB size n ¸ Õ(|C|o(1) log|U|)

  • error is ~ n2/3

  • Runtime poly(|C|,|U|)

    Output is a synthetic database

    Compare to [Blum Ligget Roth]:

    n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)


Recursive algorithm
Recursive Algorithm Simultaneously

Start with DB D and large query set C

Repeatedly choose random subset Ci+1of Ci:shrink query set by (small) factor

C0=C

C1

C2

Cb


Recursive algorithm1
Recursive Algorithm Simultaneously

Start with DB D and large query set C

Repeatedly choose random subset Ci+1of Ci:shrink query set by (small) factor

End recursion: sanitize D w.r.t. small query set Cb

Output is good for all queries in small setCi+1

Extract utility on almost-all queries in large set Ci

Fix remaining “underprivileged” queries in large set Ci

C0=C

C1

C2

Cb


ad