Near-Optimal Private Approximation Protocols via a Black Box Transformation

Near-Optimal Private Approximation Protocols via a Black Box Transformation David Woodruff IBM Almaden

Outline • Communication Protocols and Goals • Private Approximation Protocols • Previous Work • Our Results • Proof of our Main Transformation

t-Party Communication Model x1 xt x2 x3 xt-1 … What is f(x1, x2, …, xt)?

Application – IP session data AT & T collects 100+ GBs of NetFlow everyday

Application – IP Session Data • AT & T needs to process massive stream of network data • Traffic estimation What fraction of network IP addresses are active? Distinct elements computation • Traffic analysis What are the 100 IP addresses with the most traffic? Frequent items computation • Security/Denial of Service Are there any IP addresses witnessing a spike in traffic? Skewness computation

Application – Secure Datamining • For medical research, hospitals wish to mine their joint data • Patient confidentiality imposes strict laws on what information can be shared. Mining cannot leak anything sensitive

Protocol Goals • Communication Complexity: Minimize total number of bits exchanged between the parties • Round Complexity: Minimize total number of messages exchanged between the parties • Computational Complexity: Minimize workload of the parties • Privacy: No party should learn unnecessary information about another party’s input

Initial Observations Computing many functions for which the parties are deterministic require a huge amount of communication How do we cope? Allow randomness and a small chance of error Even if the parties are randomized, unless they output approximate answers, the communication is large How do we cope? Settle for an approximation This helps with communication, round, and computational complexity, but what is a private randomized approximation?

Privacy Definition First, what does privacy mean for computing a function f? 8 i:Partyidoes not learn anything aboutxj, j  i,other than what follows fromxiandf(x1, …, xt) Minimal Requirement What does privacy mean for approximating a function f? 8 i:Partyi not learn anything aboutxj, j  i,other than what follows fromxiand the approximation tof(x1, …, xt) Does this work? Not Sufficient!!

Privacy Definition Party 1 Party 2 x22{0,1}n x12 {0,1}n What is the Hamming Distance f(x1, x2) between x1 and x2? Set the LSB of the approximation f’(x1, x2) to be LSB of x2, and the remaining bits of f’(x1, x2) to agree with those of f(x1, x2) f’(x1, x2) is a +/- 1 approximation to f(x1, x2), but Alice learns LSB of x2 , which doesn’t follow from x1and f(x1, x2)

New Privacy Definition [FIMNSW] What does privacy mean for approximating a function f? 8 i:Party idoes not learn anything aboutxj, j  i,other than what follows fromxiandf(x1, …, xt) New Requirement f’(x1, …, xt)is determined by f(x1, …, xt)and the randomness Implications So, we allow for approximation to reduce communication, but we define privacy with respect to exact computation

Simplifications for This Talk • We only consider two parties in the rest of the talk • Their names are Alice and Bob • Their inputs are x and y

What Can Alice and Bob do to Breach Privacy? Alice Bob Difficult to achieve security in malicious model… x y Semi-honest: parties follow their instructions but try to learn more than what is prescribed Malicious: parties deviate from the protocol arbitrarily - Use a different input - Force other party to output wrong answer - Abort before other party learns answer

Protocol secure in the malicious model Protocol secure in the semi-honest model Reductions – Yao, GMW, NN Efficiency of the new protocol = Efficiency of the old protocol It suffices to design protocols in the semi-honest model The parties follow the instructions of the protocol. Don’t need to worry about “weird” behavior. Just make sure neither party learns anything about the other party’s input, other than what follows from the exact function value

Complicated Protocol Using known techniques, just need efficient simulators SA and SB that depend only on x, y, rA, rB and f(x,y) More Simplifications Alice Input x Random string rA Bob Input y Random string rB Output f’(x,y)

Simulators SA(x, f(x,y)) =negl(n) (rA, x, f’(x,y)) SB(y, f(x,y)) =negl(n) (rB, y, f’(x,y))

Known Private Approximations “Even functions that are efficiently computable for moderately sized data sets are often not efficiently computable for massive data sets.” [FIMNSW]

What about all of these problems? • Lp-norm for p > 2 and p = 0 • Lp-heavy hitters for every p • Lp-sampling • Max Dominance Norm • Distinct Summation • Empirical Entropy • Cascaded Moments • Subspace Approximation • L2-distance to independence • Etc.

Other Related Work • Can privately approximate the permanent of a matrix [FIMNSW] • Some NP-hard problems can be privately approximated if leak a few bits [HKKN] • Many NP-hard problems cannot be privately approximated even when leaking a large number of bits [BHN] • If answer is not unique, e.g., search problem, private approximations even harder to come by [BCNW]

Our Main Transformation • Suppose f =Σi=1n g(xi, yi) • suppose g is non-negative and efficiently computable • Let ¦ be an arbitrary non-private protocol for approximating • f up to a (1 ± 1/log n)-factor with probability ¸ 2/3 • Then there is a private approximation protocol ¦’ for • approximating f up to a (1 ± ε)-factor with probability ¸ 2/3 • The communication, round, and computational complexity • of ¦’ agree with that of ¦ up to a poly(log n / ε) factor

Near-Optimal Private Approximation Protocols

Other Private Approximations • Also obtain near-optimal bounds for: • Cascaded frequency moments • L2-distance to Independence • Using [BO], we get O*(1) communication for any g(xi, yi) = h(xi-yi) where h has “at most quadratic growth’’

Weaker Assumptions • If non-private protocol ¦ is a “simultaneous protocol”, then it is enough to assume symmetrically private information retrieval with polylog(n) communication [CMS, NP]

Main Transformation How do we design ¦’ given such a procedure? • Given a non-private approximation protocol ¦ for approximating f(x,y) = Σi=1n g(xi, yi), we design a private approximation protocol ¦’ • Main Theorem: There is a low-communication importance sampling procedure which: • If B is an upper bound on f(x,y), • Then Alice and Bob sample from a distribution¹on [n] [? : 8 i 2 [n], ¹(i) = g(xi, yi)/B ¹(?) = 1- f(x,y)/B

Importance Sampling Procedure obtains samples from [n] [?. 1-Pr [obtain ?]= f(x,y)/B Private Approximation Protocol Repeat a few times to get concentration If most repetitions returnc = 0,replaceB with B/2,and repeat The process of halving B depends only on f(x,y), which helps for simulation Once B < 2f(x,y), with very high probability, enough coin tosses are 1 • Thus, this probability depends only on f(x,y)! • Let B be an upper bound on f(x,y) • The protocol outputs a bit c. • Since c is a bit, it is determined from its expectation. • Pr[c = 1] = 1-Pr[obtain ?] = f(x,y)/B · 1

What’s left? • Need an importance sampling procedure, and show our overall approximation protocol is simulatable •  We can’t sample exactly from ¹on [n] [? : 8 i 2 [n], ¹(i) = g(xi, yi)/B ¹(?) = 1- f(x,y)/B •  We can sample from a distribution with negl(n) distance from¹

Notation • For input vectors x and y, let f[a,b] = Σi=ab g(xi, yi)

Importance Sampling ¦ is a non-private protocol for (1/log n, negl(n))-approximating f = Σi=1n g(xi, yi), f*[1, n/2] is a (1 ± 1/log n)-approximation to f[n/2] x, rA y, rB Use ¦ to estimate f[1, n/2], obtaining f*[1, n/2] Use ¦ to estimate f[n/2+1, n], obtaining f*[n/2+1, n] Recurse on [1, n/2] with probability f*[1,n/2]/(f*[1,n/2] + f*[n/2+1, n]) Else recurse on [n/2+1, n]

Importance Sampling With probability f*[1,4]/(f*[1,4] + f*[5, 8]) go left, else go right Pr[g(x3, y3) chosen] = f*[1,4]/(f*[1,4]+f*[5,8]) x f*[3,4]/(f*[1,2]+f*[1,4]) x g(x3, y3)/(g(x3, y3)+g(x4, y4)) = C*g(x3, y3)/f(x,y) With probability f*[1,2]/(f*[1,2] + f*[3, 4]) go left, else go right f[1,8] With probability g(x3, y3)/(g(x3, y3)+g(x4, y4)) go left, else go right f[1,4] f[5,8] f[1,2] f[3,4] g(x3, y3) g(x4, y4)

Importance Sampling Hence, we sample from ¹: 8 i 2 [n], ¹(i) = g(xi, yi)/B ¹(?) = 1- f(x,y)/B (up to negl(n), since small probability¦fails) • Procedure gives a way to sample from a distribution ½: ½(i) = Ci ¢ g(xi,yi)/f(x,y), where Ci2 [1/2, 2] • If i is sampled, then we know the probability ½(i) that we chose it • We can also obtain g(xi, yi) efficiently • With probability g(xi,yi)/(½(i)¢B), output i, else output ? ! • Pr[don’t output ?] = i ½(i)¢g(xi,yi)/(½(i)¢B)= f(x,y)/B

Simulators • For f’(x,y) , SA generates random coins with expectation f(x,y)/B, and keeps halving B until there are enough coin tosses equal to 1 • For rA, SA outputs a random rA • SA outputs (rA, x, f’(x,y)) which is equal to the distribution in ¦’ except with negl(n) probability SA(x, f(x,y)) =negl(n) (rA, x, f’(x,y))

Conclusions • Any non-private approximation protocol for a function f = Σi=1n g(xi, yi) can be transformed into a private one with an O*(1) blowup in complexity • Many problems can be expressed this way (e.g., lp-norms), even non-obvious ones (e.g., entropy), for which we had no technique of achieving a private approximation • What about other functions?

Near-Optimal Private Approximation Protocols via a Black Box Transformation

Near-Optimal Private Approximation Protocols via a Black Box Transformation

Presentation Transcript

Black Box Testing

Ch. 6 - Approximation via Reweighting

BLACK BOX TESTING

Black box testing

Efficient Private Approximation Protocols

Approximation via Doubling (Part II)

Black Box Testing

Simple, Black-Box Constructions of Adaptively Secure Protocols

Approximation via Doubling

Black Box

Matter as a Black Box

Approximation via Doubling

A Near-Optimal Planarization Algorithm

Black Box Testing

Near-Optimal Private Approximation Protocols via a Black Box Transformation

Truthful and Near-Optimal Mechanism Design via Linear Programming

Efficient Private Approximation Protocols

Black-box (oracle)

Audiobook⚡ Black Box