Foundations of Privacy Lecture 4

1 / 31

# Foundations of Privacy Lecture 4 - PowerPoint PPT Presentation

Foundations of Privacy Lecture 4. Lecturer: Moni Naor. Recap of last week’s lecture. Differential Privacy Sensitivity: Global sensitivity of query q:U n → R d GS q = max D,D’ ||q(D) – q(D’)|| 1 Local sensitivity of query q at point D LS q (D)= max D’ |q(D) – q(D’)|

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Foundations of Privacy Lecture 4' - aldon

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Foundations of PrivacyLecture 4

Lecturer:Moni Naor

Recap of last week’s lecture
• Differential Privacy
• Sensitivity:
• Global sensitivity of query q:Un→Rd

GSq = maxD,D’ ||q(D) – q(D’)||1

• Local sensitivity of query q at point D

LSq(D)= maxD’ |q(D) – q(D’)|

• Smooth sensitivity

Sf*(X)= maxY {LSf(Y)e- dist(x,y)}

• Histograms
• Differential privacy of median
• Exponential Mechanism
Histograms

Inputs x1, x2, ..., xnin domain U Domain U partitioned into d disjoint bins S1,…,Sdq(x1, x2, ..., xn) = (n1, n2, ..., nd) where

nj = #{i : xi in j-th bin}

Can view as d queries: qi counts # spoints in set Si

For adjacent D,D’, only one answer can change - it can change by 1

Global sensitivity of answer vector is 1

Sufficient to add Lap(1/ε) noise to eachquery, still get ε-privacy

The Exponential Mechanism [McSherry Talwar]

A general mechanism that yields

• Differential privacy
• May yield utility/approximation
• Is defined and evaluated by considering all possible answers

The definition does not yield an efficient way of evaluating it

Application/original motivation:

Approximate truthfulness of auctions

• Collusion resistance
• Compatibility
Side bar: Digital Goods Auction
• Some product with 0 cost of production
• n individuals with valuation v1, v2, … vn
• Auctioneer wants to maximize profit
Example of the Exponential Mechanism
• Data: xi= website visited by student i today
• Range: Y = {website names}
• For each name y, let q(y, X) = #{i : xi = y}

Goal: output the most frequently visited site

• Procedure: Given X, Output website ywith probability prop to eq(y,X)
• Popular sites exponentially more likely than rare ones

Website scores don’t change too quickly

Size of subset

Setting
• For input D 2Unwant to find r2R
• Base measure  on R - usually uniform
• Score function q’:Un £R  R

assigns any pair (D,r) a real value

• Want to maximize it (approximately)

The exponential mechanism

• Assign output r2R with probability proportional to

eq’(D,r)(r)

Normalizing factor req’(D,r)(r)

The exponential mechanism is private
• Let  = maxD,D’,r |q(D,r)-q(D’,r)|

Claim: The exponential mechanism yields a 2¢¢ differentially private solution

• Prob [output = r on input D]

= eq’(D,r)(r)/req’(D,r)(r)

• Prob [output = r on input D’]

= eq’(D’,r)(r)/req’(D’,r)(r)

Ratio is

bounded by

e e

Laplace Noise as Exponential Mechanism
• On query q:Un→R let q’(D,r) = -|q(D)-r|
• Prob noise = y

e-y / 2 ye-y = /2e-y

Laplace distribution Y=Lap(b) has density function

Pr[Y=y] =1/2b e-|y|/b

y

0

-4

-3

-2

-1

1

2

3

4

5

Any Differentially Private Mechanism is an instance of the Exponential Mechanism
• Let M be a differentially private mechanism

Take q’(D,r) to be logProb[M(D) =r]

Remaining issue: Accuracy

Private Ranking
• Each element i 2 {1, … n} has a real valued score SD(i)based on a data set D.
• Goal: Output k elements with highest scores.
• Privacy
• Data set D consists of n entries in domain D.
• Differential privacy: Protects privacy of entries in D.
• Condition: Insensitive Scores
• for any element i, for any data sets D, D’ that differ in one entry:|SD(i)- SD’(i)| · 1
Approximate ranking
• Let Sk be the kth highest score based on data set D.
• An output list is  -useful if:

Soundness: No element in the output has score less than Sk - 

Completeness: Every element with score greater than Sk +  is in the output.

Score·Sk - 

Sk + ·Score

Sk - ·Score·Sk + 

Two Approaches

Each input affects all scores

• Score perturbation
• Perturb the scores of the elements with noise
• Pick the top k elements in terms of noisy scores.
• Fast and simple implementation

Question: what sort of noise should be added?

What sort of guarantees?

• Exponential sampling
• Run the exponential mechanism k times.
• more complicated and slower implementation

What sort of guarantees?

Homework

Database of n individuals, lunch options {1…k},each individual likes or dislikes each option (1 or 0)

Goal: output a lunch option that many like

For each lunch option j2[k], ℓ(j) is # of ind. who like j

Exponential Mechanism:Output j with probability eεℓ(j)

Actual probability: eεℓ(j)/(∑ieεℓ(i))

Normalizer

Synthetic DB: Output is a DB

?

Sanitizer

query 1,query 2,. . .

Database

Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB

Software and people compatible

Using exponential mechanism

Differential Privacy for every set Cof counting queries

Error is Õ(n2/3 log|C|)

Remarkable

Hope for rich private analysis of small DBs!

Quantitative: #queries >> DB size,

Qualitative: output of sanitizer -synthetic DB-output is a DB itself

Counting Queries

DatabaseDof sizen

• Queries with low sensitivity

Counting-queries

Cis a setof predicates c: U  {0,1}

Query: how many D participants satisfy c ?

Relaxed accuracy:

Not so bad:error anyway inherent in statistical analysis

Assume all queries given in advance

Query c

U

Non-interactive

Utility and Privacy Can’t Always Be Achieved Simultaneously

Impossibility results for counting queries:

DB with n participants

can’t have o(√n) error, O(n) queries[DiNi, DwMcTa07,DwYe08]

In all these cases, strong privacy violation

What can we do?

almost entire DB compromised

Huge DBs [Dwork Nissim]

DB of size n >> # queries |C|:

Noise per query ~ #queries

For accuracy, need #queries ≤ n

DB of size n < #queries |C|,

impossibility results:can’t have o(√n) error

Error must be Ω(√n)

The BLR Algorithm

For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)|

Intuition: far away DBs get smaller probability

Blum Ligett Roth08

Algorithm on input DB D:

Sample from a distribution on DBs of size m: (m < n) DB F gets picked w.p. /e-ε·dist(F,D)

The BLR Algorithm

Idea:

• In general: Do not use large DB
• DB of size m guaranteeing hitting each query with sufficient accuracy
The BLR Algorithm: 2ε-Privacy

For adjacent D,D’ for every F|dist(F,D) – dist(F,D’)| ≤ 1

Probability ofFby D:e-ε·dist(F,D)/∑G of size m e-ε·dist(G,D)

Probability of F by D’:numerator and denominator can change by eε-factor 2ε-privacy

Algorithm on input DB D:

Sample from a distribution on DBs of size m: (m < n) DB Fgets picked w.p. / e-ε·dist(F,D)

The BLR Algorithm: Error Õ(n2/3 log|C|)

There exists Fgood of size m=Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤α

Pr [Fgood] ~ e-εα

For α=Õ(n2/3log|C|), Pr [Fgood] >> ∑ Pr [Fbad]

Algorithm on input DB D:

Sample from a distribution on DBs of size m: (m < n)DBF gets picked w.p. /e-ε·dist(F,D)

The BLR Algorithm: Running Time

Generating the distribution by enumeration:Need to enumerate every size-m database,where m= Õ((n\α)2·log|C|)

Running time ≈|U|Õ((n\α)2·log|c|)

Algorithm on input DB D:

Sample from a distribution on DBs of size m: (m < n) DB F gets picked w.p. /e-ε·dist(F,D)

Conclusion

Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries

Error α is Õ(n2/3 log|C|/ε)

Super-poly running time: |U|Õ((n\α)2·log|C|)

Can we Efficiently Sanitize?

The good news

If the universe is small, Can sanitize EFFICIENTLY

cannot do much better, namely sanitize in time:sub-poly(|C|) AND sub-poly(|U|)

Timepoly(|C|,|U|)

How Efficiently Can We Sanitize?

|C|

subpoly

poly

|U|

subpoly

?

?

poly

?

?

Good news!

The Good News: Can Sanitize When Universe is Small

Efficient Sanitizer for query set C

• DB size n ¸ Õ(|C|o(1) log|U|)
• error is ~ n2/3
• Runtime poly(|C|,|U|)

Output is a synthetic database

Compare to [Blum Ligget Roth]:

n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)

Recursive Algorithm

Repeatedly choose random subset Ci+1of Ci:shrink query set by (small) factor

C0=C

C1

C2

Cb

Recursive Algorithm

Repeatedly choose random subset Ci+1of Ci:shrink query set by (small) factor

End recursion: sanitize D w.r.t. small query set Cb

Output is good for all queries in small setCi+1

Extract utility on almost-all queries in large set Ci

Fix remaining “underprivileged” queries in large set Ci

C0=C

C1

C2

Cb