1 / 29

Private Analysis of Data Sets

Private Analysis of Data Sets. Benny Pinkas HP Labs, Princeton. A story. We’re experiencing a lot of fraud lately…. Here too. I can’t find a pattern to recognize fraud in advance. Neither can I. But, what about Patients’ privacy Business secrets. Maybe we should share information.

blue
Download Presentation

Private Analysis of Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton

  2. A story We’re experiencing a lot of fraud lately… Here too.. I can’t find a pattern to recognize fraud in advance.. Neither can I.. • But, what about • Patients’ privacy • Business secrets Maybe we should share information.. Have you heard of “Secure function evaluation” ? This is all “theory”. It can’t be efficient.

  3. New Opportunities for Interaction Between • Enterprises, and government agencies holding sensitive data. • P2P users • Mobile wireless crowds (PDAs, cell phones) • What about privacy? • A bidirectional approach: • Finding what is actually needed • Designing useful and efficient cryptographic tools

  4. Cryptographic Protocols for Privacy Preserving Computation y x Input: F(x,y) and nothing else Output: y As if… x F(x,y) F(x,y)

  5. Does the trusted party scenario make sense? y x F(x,y) F(x,y) • We cannot hope for more privacy • Does the trusted party scenario make sense? • Are the parties motivated to submit their true inputs? • Can they tolerate the disclosure of F(x,y)? • If so, we can implement the scenario without a trusted party.

  6. y x Input: nothing C(x,y) and nothing else Output: Secure Function Evaluation [Yao,GMW,BGW] • F(x,y) – A public function. • Represented as a Boolean circuit C(x,y). • Implementation: • O(|X|) “oblivious transfers”. O(|C|) communication. • Pretty efficient for small circuits! (but what about • larger circuits?)

  7. AND = = = x1 y1 x2 y2 xn yn An equality circuit 1 if x=y 0 otherwise = x y

  8. Cryptographic methods Randomization methods [statistical disclosure, AS] Cryptographic methods vs. randomization methods overhead Our goal… inaccuracy lack of privacy

  9. Examples of Simple Privacy Preserving Primitives (with reasonable solutions) • Is X = Y? Is X > Y? • What is X  Y? What is median of X  Y? • Auctions (negotiations). Many parties, private bids. Compute the winning bidder and the sale price, but nothing else. [NPS] • Voting • Add privacy to data mining algs (ID3 – [LP])

  10. Private Set Intersection with Mike Freedman, NYU Kobbi Nissim, MSR

  11. Applications of Set Intersection Government agency B Government agency A People on welfare Expensive car buyers Compute intersection and nothing else

  12. Computing the Intersection • Private Equality Test (PET) • Alice: x. Bob: y. • Output: 1 iff x=y • Privacy preserving solutions: • Cannot use hash functions alone • Yao, [FNW], [NP] • Generalization: list intersection • X = x1, …, xn Y = y1, …, yn

  13. The basic tool: Homomorphic Encryption • Semantically secure public key encryption • Given Enc(M1), ENC(M2), can compute (without knowing the decryption key) • Enc(M1+M2) • Enc(c· M1) for any constant c. • I.e. Enc(a0)+Enc(a1)x+…+Enc(an)xn = Enc(P(x)) • Examples: El Gamal, Paillier, DJ.

  14. The Scenario • Client: X = x1, …, xn • Server: Y = y1, …, yn • Output: • Client learns X  Y. • Server learns nothing.

  15. The Protocol • Client defines a polynomial of degree n whose roots are x1,…,xn • P(y) = (x1-y)·(x2-y)·…·(xn-y) = anyn + … + a1y + a0 • Sends to server homomorphic encryptions of coefficients • Enc(an),…, Enc(a0) • (only the client can decrypt)

  16. …The Protocol • Server uses homomorphic properties to compute yEnc( r·P(y) + y) (r is random) • If yXY result is Enc(r·0+y)=Enc(y), otherwise result is Enc(random). • Server sends (permuted) results to C. • C decrypts, compares to its list.

  17. Security • Bad server? The server only sees semantically secure encryptions. Learning about C’s input = breaking enc. • Bad client? The client can, given only the output XY,simulate her “view” in the protocol. (I.e. she generates encryptions of items in XY, and of random items.)

  18. Efficiency • Client encrypts and decrypts n values • Communication is O(n) • Server: • For each input computes Enc(r·P(y)+y), i.e. n exponentiations. • Total O(n2) exponentiations • Can use hashing to reduce overhead to O(n lnln n).

  19. Is Approximation easier? • Can we approximate size of intersection (i.e. scalar product) with sublinear overhead? • Lower bound:  • Approximating |XY| within 1  ε factor requires Ω(n) communication (constant ε). • True even for randomized algorithms. • Proof: reduction to Razborov’s lower bound for Disjointness. • Upper bound: protocols with matching overhead.

  20. Secure Computation of the Kth-ranked element with Gagan Aggarwal, Stanford Nina Mishra, HPL

  21. Secure Computation of the Kth-ranked element • Inputs: • A: SA B: SB • Large sets of unique items (D). • There’s also the multi-party scenario • Output: x  SA SB s.t. |{y | y<x, ySASB}| = k-1 • Median: k = (|SA| + |SB|) / 2

  22. Motivation • Basic statistical analysis of distributed data • E.g. histogram of salaries in competing business in the same area • Sometimes the parties might want to hide the size of their inputs

  23. Some information is always revealed • The Kth-ranked elementreveals some information • Suppose SA = x1,…,x1000 • Median of SA  SB = x400 • Party A now learns that SB contains at least 200 elements smaller than x400 • But she shouldn’t learn more

  24. Results, and previous work • Previous work: generic constructions – overhead at least linear in k. • New results: • Two-party: log k secure comparisons of log D bit numbers. • Multi-party: log D simple computations with log D bit numbers.

  25. An (insecure) two-party median protocol SA LA mA RA mA < mB SB LB mB RB LA lies below the median, RB lies above the median. New median is same as original median. Recursion  Need log n rounds (suppose each set contains 2i items)

  26. Secure two-party median protocol A deletes xєSA s.t. x < mA. B deletes xєSB s.t. x > mB. YES A finds median of SA, call it mA B finds median of SB, call it mB mA<mB A deletes xєSA s.t. x > mA. B deletes xєSB s.t. x < mB. NO Secure comparison (e.g. a small circuit)

  27. Proof of security • Simulation: Given the protocol’s output, each party can simulate the execution of the protocol SA median First comparison: mA<mB Second comparison: mA>mB

  28. + - + + Arbitrary inputs, arbitrary k SA K 2i SB Now, compute the median of two sets of size k Size should be a power of 2 median of new inputs = kth element of original inputs

  29. Conclusions • Efficient privacy preserving primitives for basic tasks • Open problems • Intersection: approximate matching? • Median: clustering? • Theory and applications can and should interact • Tools from the theory of cryptography (e.g. SFE) can be used in applications • Applications can benefit from rigorous analysis • There’s a lot more to be done…

More Related