Boosting and Differential Privacy

Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA

The Power of Small, Private, Miracles Joint work with Guy Rothblum and SalilVadhan TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA

Boosting [Schapire, 1989] • General method for improving accuracy of any given learning algorithm • Example: Learning to recognize spam e-mail • “Base learner” receives labeled examples, outputs heuristic • Labels are {+1, -1} • Run many times; combine the resulting heuristics

S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A A1, A2, … Combine A1, A2, … Update D Terminate?

S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A A1, A2, … Combine A1, A2, … Update D Terminate? How?

Boosting for People [Variant of AdaBoost, FS95] • Initial distribution D is uniform on database rows • S is always a subset of k elements drawn from Dk • Combiner is majority • Weight update: • If correctly classified by current A, decrease weight by factor of e • “subtract 1 from exponent” • If incorrectly classified by current A, increase weight by factor of e • “add 1 to exponent” • Re-normalize to obtain updated D

Why Does it Work? At(i) correct? Update rule: multiply weight by exp(-ct (i)) Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt NtDt+1(i) = Dt(i) exp(-ct(i)) NtNt-1…N1Dt+1(i) = D1(i)exp(-scs(i)) sNs Dt+1 (i) = (1/m) exp (- scs(i)) isNs Dt+1(i) = (1/m) iexp (- scs(i)) sNs = (1/m) i exp (-scs(i))

Why Does it Work? At(i) correct? Update rule: multiply weight by exp(-ct (i)) Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt NtDt+1(i) = Dt(i) exp(-ct(i)) NtNt-1…N1Dt+1(i) = D1(i)exp(-scs(i)) sNsDt+1 (i) = (1/m) exp (- scs(i)) isNsDt+1(i) = (1/m) iexp (- scs(i)) sNs = (1/m) i exp (-scs(i))

s Ns = (1/m) i exp (-scs(i)) • s Ns is shrinking exponentially (depends on ´) • Normalizers are sums of weights; • At start of each round these sum to 1 • “more” decrease (because the base learner is good) than increase • More weight has the exponent shrink than otherwise • i exp (-scs(i)) = i exp (- yis As(i)) • This is an upper bound on # of incorrectly classified examples: • If yi ≠ sign[sAs(i)] ( = majority{A1(i), A2(i),…}), then yisAs(i) < 0, so exp(-yisAs(i)) ≥ 1. • Therefore, the number of incorrectly classified examples is exponentially small in t

Initially: D uniform on DB rows S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A Privacy? majority A1, A2, … Combine A1, A2, … Update D -1/+1 renormalize Terminate?

Private Boosting for People • Base learner must be differentially private • Main concern is rows whose weight grows too large • Affects termination test, sampling, re-normalizing • Similar to problem arising when learning in the presence of noise • Similar solution: smooth boosting • Remove (give up on) elements that become too heavy • Carefully! Removing one heavy element and re-normalizing may cause another element to become heavy… • Ensure this is rare (else give up on too many elements; hurt accuracy)

Iterative Smoothing • Not today.

Boosting for Queries? • Goal: Given database DB and a set Q of low-sensitivity queries, produce an object O (eg, synthetic database) such that 8 q 2Q : can extract from O an approximation of q(DB). • Assume existence of (²0, ±0)-dp Base Learner producing an object O that does well on more than half of D • Pr q» D [ |q(O) – q(DB)| < ¸ ] > (1/2 + ´)

Initially: D uniform on Q S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A A1, A2, … Combine A1, A2, … Update D

Initially: D uniform on Q S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A Privacy? Terminate? A1, A2, … median Combine A1, A2, … Update D -1/+1 renormalize Individual can affect many queries at once!

Privacy is Problematic • In smooth boosting for people, at each round an individual has only a small effect on the probability distribution • In boosting for queries, an individual can affect the quality of q(At) simultaneously for many q • As time progresses, distributions on neighboring databases could evolve completely differently, yielding very different At’s • Slightly ameliorated by sampling (if only a few samples, maybe can avoid the q’s on the edge?) • How can we make the re-weighting less sensitive?

Private Boosting for Queries [Variant of AdaBoost] • Initial distribution D is uniform on queries in Q • S is always a set of k elements drawn from Qk • Combiner is median [viz. Freund92] • Weight update for queries • If very well approximated by At, decrease weight by factor of e (“-1”) • If very poorly approximated by At, increase weight by factor of e (“+1”) • In between, scale with distance of midpoint (down or up): 2 ( |q(DB) – q(At)| - (¸ + ¹/2) ) / ¹ (sensitivity 2½/¹)   + 

Theorem (minus some parameters) • Let all q 2Q have sensitivity ·½. • Run the query-boost algorithm for T = log | Q |/´2 rounds with ¹ = ((log | Q |/´2 )2 ½ √k ) / ² • The resulting object Q is ( (² + T²0), T±0) )-dp and, whp, gives (¸+¹)-accurate answers to all the queries in Q . • Better privacy (small ²) gives worse utility (larger ¹) • Better base learner (smaller k, larger ´) helps

Proving Privacy • Technique #1: Pay Your Debt and Move On • Fix A1, A2, …, At (record DvsD’ confidence gain) “Pay Your Debt” • Focus on gain in selection of S 2Qk in round t+1 “Move On” • Based on distributions Dt+1 and D’ t+1 determined in round t • Will call them D, D’ • Technique #2: Evolution of Confidence [DiDwN03] • “Delay Payment Until Final Reckoning” • Choose q1, q2, …, in turn • For each q 2Q, bound |ln ( D[q] / D’[q] )| and expectation | Eq»Dln ( D[q ] / D’[q] )| • Prq1,…,qk [| iln ( D[qi ] / D’[qi] )| > z√k (A + B) + k B] < exp(-z2/2) A B

Bounding Eq»Dln ( P[q ] / P’[q] ) Assume D, D’ are A-dpwrt one another, for A < 1. Then 0 ·Eq» Dln[ D(q)/D’(q) ] ·2A2 (that is, B· 2A2). KL(D||D’) = qln[ D(q)/D’(q) ] D(q); always ¸ 0 So, KL(D||D’) · KL(D||D’) + KL(D’||D) = qD(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ] · q 0 + |D’(q)-D(q)| A = A  q [ max (D(q),D’(q)) - min (D(q),D’(q)) ] · A  qeA min (D(q),D’(q)) - min (D(q),D’(q)) · A  q (eA – 1) min (D(q),D’(q)) ·2A2 when A < 1 Compare DiDwN03

Motivation and Application • Boosting for People • Logistic Regression for 3000+ dimensional data • Slight twist on CM did pretty well (eps = 1.5) • Thought about alternatives • Boosting for Queries • Reducing the dependence on the concept class in the work on synthetic databases in DNRRV09 (Salil’s talk) • Over-interpreted the polytimeDiNi style attacks (we were spoiled) • Can’t have cn queries with error o(√n) • BLR08: can have cn queries with error O(n2/3) • DNNRV09: O(n1/2 |Q |o(1)) • Now: O(n1/2log2 |Q |) • Result is more general • Only know of base learner for counting queries

S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A A1, A2, … Combine A1, A2, … Update D Terminate?

Boosting and Differential Privacy

Boosting and Differential Privacy

Presentation Transcript

Differential Privacy: Case Studies

Mechanism Design via Differential Privacy

Computational Differential Privacy

The Complexity of Differential Privacy

Differential Privacy

Differential Privacy in US Census

Differential Privacy Under Fire

Differential Privacy

Differential Privacy

The Promise of Differential Privacy

Differential Privacy

Defining and Achieving Differential Privacy

Differential Privacy (2)

Differential Privacy

Differential Privacy

Differential Privacy

From Stability to Differential Privacy