1 / 23

Cryptographic methods for privacy aware computing: applications

Cryptographic methods for privacy aware computing: applications. Outline. Review: three basic methods Two applications Distributed decision tree with horizontally partitioned data Distributed k-means with vertically partitioned data. Three basic methods. 1-out-K Oblivious Transfer

merle
Download Presentation

Cryptographic methods for privacy aware computing: applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cryptographic methods for privacy aware computing: applications

  2. Outline • Review: three basic methods • Two applications • Distributed decision tree with horizontally partitioned data • Distributed k-means with vertically partitioned data

  3. Three basic methods • 1-out-K Oblivious Transfer • Random share • Homomorphic encryption * Cost is the major concern

  4. Two example protocols • The basic idea is • Do not release original data • Exchange intermediate result • Applying the three basic methods to securely combine them

  5. Building decision trees over horizontally partitioned data • Horizontally partitioned data • Entropy-based information gain • Major ideas in the protocol

  6. Horizontally Partitioned Data • Table with key and r set of attributes key X1…Xd K1k2 kn key X1…Xd key X1…Xd key X1…Xd Ki+1ki+2 kj Km+1km+2 kn K1k2 ki Site 1 Site 2 … Site r

  7. Review decision tree algorithm (ID3 algorithm) • Find the cut that maximizes gain • certain attribute Ai, sorted v1…vn • Certain value in the attribute • For categorical data we use Ai=vi • For numerical data we use Ai<vi Ai<vi? yes no … Aj<vj? Ai label E(): Entropy of label distribution v1v2 vn l1l2 ln cut Choose the attribute/value that gives the highest gain!

  8. Key points • Calculating entropy Ai label v1v2 vn l1l2 ln cut • The key is calculating x log x, where • x is the sum of values from the two parties • P1 and P2 , i.e., x1 and x2, respectively • decomposed to several steps • Each step each party knows only a random • share of the result

  9. steps Step1: compute shares for w1 +w2= (x1+x2)ln(x1+x2) * a major protocol is used to compute ln(x1+x2) Step 2: for a condition (Ai, vi), find the random shares for E(S), E(S1) and E(S2) respectively. Step3: repeat step1&2 to all possible (Ai, vi) pairs Step4: a circuit gate to determine which (Ai, vi) pair results in maximum gain. (Ai,vi) with Maximum gain w11 w21 x1 … … w12 w22 x2

  10. 2. K-means over vertically partitioned data • Vertically partitioned data • Normal K-means algorithm • Applyingsecure sum and secure comparison among multi-sites in the secure distributed algorithm

  11. Vertically Partitioned Data • Table with key and r set of attributes key X1…Xi Xi+1…Xj … Xm+1…Xd key X1…Xi key Xi+1…Xj key Xm+1…Xd Site 1 Site 2 … Site r

  12. Motivation • Naïve approach: send all data to a trusted site and do k-mean clustering there • Costly • Trusted third party? • Preferable: distributed privacy preserving k-means

  13. Basic K-means algorithm • 4 main steps: step1.Randomly select k initial cluster centers (k means) repeat step2. Assign any point i to its closest cluster center step 3. Recalculate the k means with the new point assignment Until step 4. the k means do not change

  14. Distributed k-means • Why k-means can be done over vertically partitioned data • All of the 4 steps are decomposable ! • The most costly part (step 2 and 3) can be done locally • We will focus on the step 2 (Assign any point i to its closest cluster center)

  15. step 1 • All sites share the index of the initial random k records as the centroids µ11 … µ1i µ1i+1 … µ1j µ1m …µ1d µ1 µk µk1 … µki µki+1 … µkj µkm … µkd Site 1 Site 2 … Site r

  16. Step 2: • Assign any point x to its closest cluster center • Calculate distance of point X (X1, X2, … Xd) to each cluster center µk -- each distance calculation is decomposable! d2 = [(X1- µk1)2 +… (Xi- µki)2] + [(Xi+1- µki+1)2 +… (Xj- µkj)2] + … 2. Compare the k full distances to find the minimum one Partial distances: d1 + d2 + … Site1 site2 For each X, each site has a k-element vector that is the result for the partial distance to the k centroids, notated as Xi

  17. Privacy concerns for step 2 • Some concerns: • Partial distances d1, d2 … may breach privacy (the Xi and µki ) – need to hide it • distance of a point to each cluster may breach privacy – need hide it • Basic ideas to ensure security • Disguise the partial distances • Compare distances so that only the comparison result is learned • Permute the order of clusters so the real meaning of the comparison results is unknown. • Need 3 non-colluding sites (P1, P2, Pr)

  18. Secure Computing of Step 2 • Stage1: prepare for secure sum of partial distances • p1 generate V1+V2 + …Vr = 0, Vi is random k-element vector, used to hide the partial distance for site i • Use “Homomorphic encryption” to do randomization: Ei(Xi)Ei(Vi) = Ei(Xi+Vi) • Stage2: calculate secure sum for r-1 parties • P1, P3, P4… Pr-1 send their perturbed and permuted partial distances to Pr • Pr sums up the r-1 partial distances (including its own part)

  19. Secure Computing of Step 2 Stage 1 Stage 2 * Xi contains the partial distances to the k partial centroids at site i * Ei(Xi)Ei(Vi) = Ei(Xi+Vi) : Homomorphic encryption, Ei is public key * (Xi) : permutation function, perturb the order of elements in Xi * V1+V2 + …Vr = 0, Vi is used to hide the partial distances

  20. Stage 3: secure_add_and_compare to find the minimum distance • Involves only Pr and P2 • Use a standard Secure Multiparty Computation protocol to find the result • Stage 4: • the index of minimum distance (permuted cluster id) is sent back to P1. • P1 knows the permutation function thus knows the original cluster id. • P1 broadcasts the cluster id to all parties. K-1 comparisons:

  21. Step 3: can also be done locally • Update partial means µi locally according to the new cluster assignments. Cluster labels X11 … X1i X1i+1 … X1j X1m …X1d Cluster 2 X21 … X2i Cluster k Cluster k Xn1 … Xni Xni+1 … Xnj Xnm … Xnd Site 1 Site 2 … Site r

  22. Extra communication cost • O(nrk) • n : # of records • r: # of parties • k: # of means • Also depends on # of iterations

  23. Conclusion • It is appealing to have cryptographic privacy preserving protocols • The cost is the major concern • It can be reduced using novel algorithms

More Related