240 likes | 360 Views
Boosting and Differential Privacy. Cynthia Dwork, Microsoft Research. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. The Power of Small, Private, Miracles. Joint work with Guy Rothblum and Salil Vadhan. TexPoint fonts used in EMF.
E N D
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA
The Power of Small, Private, Miracles Joint work with Guy Rothblum and SalilVadhan TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA
Boosting [Schapire, 1989] • General method for improving accuracy of any given learning algorithm • Example: Learning to recognize spam e-mail • “Base learner” receives labeled examples, outputs heuristic • Labels are {+1, -1} • Run many times; combine the resulting heuristics
S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A A1, A2, … Combine A1, A2, … Update D Terminate?
S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A A1, A2, … Combine A1, A2, … Update D Terminate? How?
Boosting for People [Variant of AdaBoost, FS95] • Initial distribution D is uniform on database rows • S is always a subset of k elements drawn from Dk • Combiner is majority • Weight update: • If correctly classified by current A, decrease weight by factor of e • “subtract 1 from exponent” • If incorrectly classified by current A, increase weight by factor of e • “add 1 to exponent” • Re-normalize to obtain updated D
Why Does it Work? At(i) correct? Update rule: multiply weight by exp(-ct (i)) Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt NtDt+1(i) = Dt(i) exp(-ct(i)) NtNt-1…N1Dt+1(i) = D1(i)exp(-scs(i)) sNs Dt+1 (i) = (1/m) exp (- scs(i)) isNs Dt+1(i) = (1/m) iexp (- scs(i)) sNs = (1/m) i exp (-scs(i))
Why Does it Work? At(i) correct? Update rule: multiply weight by exp(-ct (i)) Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt NtDt+1(i) = Dt(i) exp(-ct(i)) NtNt-1…N1Dt+1(i) = D1(i)exp(-scs(i)) sNs Dt+1 (i) = (1/m) exp (- scs(i)) isNs Dt+1(i) = (1/m) iexp (- scs(i)) sNs = (1/m) i exp (-scs(i))
Why Does it Work? At(i) correct? Update rule: multiply weight by exp(-ct (i)) Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt NtDt+1(i) = Dt(i) exp(-ct(i)) NtNt-1…N1Dt+1(i) = D1(i)exp(-scs(i)) sNs Dt+1 (i) = (1/m) exp (- scs(i)) isNs Dt+1(i) = (1/m) iexp (- scs(i)) sNs = (1/m) i exp (-scs(i))
Why Does it Work? At(i) correct? Update rule: multiply weight by exp(-ct (i)) Dt+1(i) = [Dt(i) exp(-ct (i)] / Nt NtDt+1(i) = Dt(i) exp(-ct(i)) NtNt-1…N1Dt+1(i) = D1(i)exp(-scs(i)) sNsDt+1 (i) = (1/m) exp (- scs(i)) isNsDt+1(i) = (1/m) iexp (- scs(i)) sNs = (1/m) i exp (-scs(i))
s Ns = (1/m) i exp (-scs(i)) • s Ns is shrinking exponentially (depends on ´) • Normalizers are sums of weights; • At start of each round these sum to 1 • “more” decrease (because the base learner is good) than increase • More weight has the exponent shrink than otherwise • i exp (-scs(i)) = i exp (- yis As(i)) • This is an upper bound on # of incorrectly classified examples: • If yi ≠ sign[sAs(i)] ( = majority{A1(i), A2(i),…}), then yisAs(i) < 0, so exp(-yisAs(i)) ≥ 1. • Therefore, the number of incorrectly classified examples is exponentially small in t
Initially: D uniform on DB rows S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A Privacy? majority A1, A2, … Combine A1, A2, … Update D -1/+1 renormalize Terminate?
Private Boosting for People • Base learner must be differentially private • Main concern is rows whose weight grows too large • Affects termination test, sampling, re-normalizing • Similar to problem arising when learning in the presence of noise • Similar solution: smooth boosting • Remove (give up on) elements that become too heavy • Carefully! Removing one heavy element and re-normalizing may cause another element to become heavy… • Ensure this is rare (else give up on too many elements; hurt accuracy)
Iterative Smoothing • Not today.
Boosting for Queries? • Goal: Given database DB and a set Q of low-sensitivity queries, produce an object O (eg, synthetic database) such that 8 q 2Q : can extract from O an approximation of q(DB). • Assume existence of (²0, ±0)-dp Base Learner producing an object O that does well on more than half of D • Pr q» D [ |q(O) – q(DB)| < ¸ ] > (1/2 + ´)
Initially: D uniform on Q S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A A1, A2, … Combine A1, A2, … Update D
Initially: D uniform on Q S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A Privacy? Terminate? A1, A2, … median Combine A1, A2, … Update D -1/+1 renormalize Individual can affect many queries at once!
Privacy is Problematic • In smooth boosting for people, at each round an individual has only a small effect on the probability distribution • In boosting for queries, an individual can affect the quality of q(At) simultaneously for many q • As time progresses, distributions on neighboring databases could evolve completely differently, yielding very different At’s • Slightly ameliorated by sampling (if only a few samples, maybe can avoid the q’s on the edge?) • How can we make the re-weighting less sensitive?
Private Boosting for Queries [Variant of AdaBoost] • Initial distribution D is uniform on queries in Q • S is always a set of k elements drawn from Qk • Combiner is median [viz. Freund92] • Weight update for queries • If very well approximated by At, decrease weight by factor of e (“-1”) • If very poorly approximated by At, increase weight by factor of e (“+1”) • In between, scale with distance of midpoint (down or up): 2 ( |q(DB) – q(At)| - (¸ + ¹/2) ) / ¹ (sensitivity 2½/¹) +
Theorem (minus some parameters) • Let all q 2Q have sensitivity ·½. • Run the query-boost algorithm for T = log | Q |/´2 rounds with ¹ = ((log | Q |/´2 )2 ½ √k ) / ² • The resulting object Q is ( (² + T²0), T±0) )-dp and, whp, gives (¸+¹)-accurate answers to all the queries in Q . • Better privacy (small ²) gives worse utility (larger ¹) • Better base learner (smaller k, larger ´) helps
Proving Privacy • Technique #1: Pay Your Debt and Move On • Fix A1, A2, …, At (record DvsD’ confidence gain) “Pay Your Debt” • Focus on gain in selection of S 2Qk in round t+1 “Move On” • Based on distributions Dt+1 and D’ t+1 determined in round t • Will call them D, D’ • Technique #2: Evolution of Confidence [DiDwN03] • “Delay Payment Until Final Reckoning” • Choose q1, q2, …, in turn • For each q 2Q, bound |ln ( D[q] / D’[q] )| and expectation | Eq»Dln ( D[q ] / D’[q] )| • Prq1,…,qk [| iln ( D[qi ] / D’[qi] )| > z√k (A + B) + k B] < exp(-z2/2) A B
Bounding Eq»Dln ( P[q ] / P’[q] ) Assume D, D’ are A-dpwrt one another, for A < 1. Then 0 ·Eq» Dln[ D(q)/D’(q) ] ·2A2 (that is, B· 2A2). KL(D||D’) = qln[ D(q)/D’(q) ] D(q); always ¸ 0 So, KL(D||D’) · KL(D||D’) + KL(D’||D) = qD(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ] · q 0 + |D’(q)-D(q)| A = A q [ max (D(q),D’(q)) - min (D(q),D’(q)) ] · A qeA min (D(q),D’(q)) - min (D(q),D’(q)) · A q (eA – 1) min (D(q),D’(q)) ·2A2 when A < 1 Compare DiDwN03
Motivation and Application • Boosting for People • Logistic Regression for 3000+ dimensional data • Slight twist on CM did pretty well (eps = 1.5) • Thought about alternatives • Boosting for Queries • Reducing the dependence on the concept class in the work on synthetic databases in DNRRV09 (Salil’s talk) • Over-interpreted the polytimeDiNi style attacks (we were spoiled) • Can’t have cn queries with error o(√n) • BLR08: can have cn queries with error O(n2/3) • DNNRV09: O(n1/2 |Q |o(1)) • Now: O(n1/2log2 |Q |) • Result is more general • Only know of base learner for counting queries
S: Labeled examples from D Base Learner Does well on ½ + ´ ofD A A1, A2, … Combine A1, A2, … Update D Terminate?