1 / 18

Other Perturbation Techniques

Other Perturbation Techniques. Outline. Randomized Responses Sketch Project ideas . Randomized Responses. Problem description A provides the answer to B’s question A wants to preserve his/her privacy Question/answer can be sensitive The method Assume the answer can be “yes” or “no”

ace
Download Presentation

Other Perturbation Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Other Perturbation Techniques

  2. Outline • Randomized Responses • Sketch • Project ideas

  3. Randomized Responses • Problem description • A provides the answer to B’s question • A wants to preserve his/her privacy • Question/answer can be sensitive • The method • Assume the answer can be “yes” or “no” • A has a probability  to be honest, and the probability 1-  to give a random response • We can estimate the real probability of “yes” and “no” from the randomized responses

  4. Notations: • O(yes): observed probability of yes from the randomized responses • # of yes/total # of responses • P(yes): real probability of yes • Inference • O(yes) = P(yes) * + P(no)*(1-) = P(yes) * + (1-P(yes))*(1-)  P(yes) = (O(yes)+-1)/(2-1)

  5. Extend to multiple categories • The answer ci has a prob ij changed to cj • O((c1,c2,…,cn)): observed prob of ci • P((c1,c2,…,cn)) : real prob of ci • The relationship between O and P Note: When  is invertible, use matrix inversion to solve P. Otherwise, use iterative methods similar to that in Rakesh’s paper

  6. Different perturbation matrices can be used. Which one is the best? • Balance between privacy and utility? Zero privacy is preserved, while full data utility is preserved Uniform randomization, privacy is fully preserved, while no data utility is left

  7. Optimizing both privacy&utility • Read paper 33 • Privacy: similar to previous discussion • Based on accuracy of estimation • A Bayes method: • C = {c1,c2,…,cn) • Y is the perturbed value, X is the original value, and X^ is the estimated value Accuracy of estimation * It can be calculated by checking the original data, the perturbed data and the estimated data

  8. Privacy • Average: 1- (accuracy of estimation) • Worst case: • Utility • P(ci) the original prob, O(ci) the prob on perturbed data, P^(ci) is the estimated prob • Utility depends on the difference between the original prob and the estimated prob

  9. Optimization algorithm • Find the perturbation that balance the two metrics • The evolutionary algorithm • Start with a set of initial RR matrices • Repeat the following steps in each iteration • Mating: selecting two RR matrices in the pool • Crossover: exchanging several columns between the two RR matrices • Mutation: change some values in a RR matrix • Meet the privacy bound: filtering the resultant matrices • Evaluate the fitness value for the new RR matrices. Note : the fitness values is defined in terms of privacy and utility metrics

  10. summary • Randomized response is the basic technique for perturbing categorical data • Boolean • Multi-category

  11. Sketch • Address the problem of high-dimensional sparse data • Multiplicative perturbation • Randomized responses • Market basket data • Bag of words

  12. Definition of sketch • Similar to projection perturbation • Map d dimensional data  r dimensional data, r<<d • Difference: for each record the mapping matrix is different • Definition • X = (x1,…xd), S(s1,…,sr) is randomly drawn from {-1, +1}

  13. property • Dot product of the original data X and Y can be approximated with their sketches • Dot product is important in calculating Euclidean distances!

  14. Accuracy of the dot product estimation Large r  smaller variance  better quality however,  lower privacy

  15. Privacy • Original data value can be estimated • Sparse data • Most are canceled in sketch • Estimate of xk :

  16. privacy •  - anonimity Suppress the record if this condition is not satisfied… Another concept: K-variance paper 29 for more details.

  17. Applications: • Dot product estimation • Determine the length of sparse transaction (# of non-zero items in boolean vector) • Determine Euclidean distance • Average of a set of records (centroid of a cluster)

More Related