1 / 47

A Privacy-Preserving Framework for Personalized Social Recommendations

A Privacy-Preserving Framework for Personalized Social Recommendations Zach Jorgensen 1 and Ting Yu 1,2. 1 NC State University Raleigh, NC, USA. 2 Qatar Computing Research Institute Doha, Qatar. EDBT March 24-28, 2014 Athens, Greece. Motivation.

medge-moore
Download Presentation

A Privacy-Preserving Framework for Personalized Social Recommendations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Privacy-Preserving Framework for Personalized Social Recommendations Zach Jorgensen1 and Ting Yu1,2 1NC State University Raleigh, NC, USA 2Qatar Computing Research Institute Doha, Qatar EDBT March 24-28, 2014 Athens, Greece

  2. Motivation Item Preferences • Social recommendation task – to predict items a user might like based on the items his/her friends like i4 i2 i3 i5 i1 Social Recommendation System recommendations Social Relations

  3. Motivation Model: Top-nSocial Recommender The utility of recommending item ito user u • Input • Items • Users • Social Graph • Preference Graph • # of recs, n • For every item i • For every user u • Compute μ(i, u) • For every user u • Sort items by utility • Recommend top n items Output A personalized list of top n items (by utility), for each user

  4. Motivation = utility of recommending item ito user u μ 1 if pref. exists 0 otherwise u, i u, v e.g., Common Neighbors Social Graph Social similarity measure

  5. Motivation • Many existing structural similarity measures could be used [Survey: Lu & Zhou, 2011] • We considered • Common Neighbors • Adamic-Adar • Graph Distance • Katz

  6. Motivation Two main privacy problems: • Protect privacy of user data from malicious service provider (i.e., the recommender) • Protect privacy of user data from malicious/curious users • Our focus: preventing disclosure of individual item preferences through the output

  7. Motivation Simple attack on Common Neighbors. Bob listens to Bieber! Bob Alice

  8. Motivation Adversary • Knowledge of all preferences except target edge • Observes all recommendations • Knowledge of the algorithm Goal: to deduce the presence/absence of a single preference edge (the target edge)

  9. Motivation Differential Privacy [Dwork, 2006] • Provides strong, formal privacy guarantees • Informally: guarantees that recommendations will be (almost) the same with/without any one preference edge in the input

  10. Motivation Related work: Machanavajjhalaet al. (VLDB 2011) • Task: For each node, recommend node with highest social similarity (Common Neighbors, Katz). • No distinction between user/items or between preferences/social edges. • Negative theoretical results.

  11. Motivation • We assume that social graph is public • Often true in practice… …

  12. Motivation • Main Contribution: a framework that enables differential privacy guarantees for preference edges • Demonstrate on real data sets that making accurate and private social recommendation is feasible

  13. Outline • Motivation • Differential Privacy • Our Approach • Experimental Results • Conclusions

  14. Differential Privacy A randomized algorithm A gives ε-differential privacy if for anyneighboring data sets D, D’ and any : X1 … Xi … Xn X1 … Xi … Xn Neighboring data sets differ ina single record [Dwork, 2006.]

  15. Achieving Differential Privacy X1 … Xi … Xn noised Global sensitivity of A: 1 Theorem: satisfies ε-differential privacy typically Smaller ε = more noise/privacy

  16. Properties of Differential Privacy • Sequential Composition DP Interface D ... ... -differential privacy • Parallel Composition ... ... ε-differentially private

  17. Outline • Motivation • Differential Privacy • Our Approach • Simplifying observations • Naïve Approaches • Our Approach • Experimental Results • Conclusions

  18. Simplifying Observations Iterations use disjoint inputs • For every item i • For every user u • Compute μ(i, u) • For every user u • Sort items by utility • Recommend top n items Post-processing Our focus: an ε-differentially private procedure for computing μ(i, u), for all users uand a giveni

  19. Naïve Approaches Approach 1: Noise-on-Utilities • For each item i • For every user u • Compute • For each user u • Sort items by utility • Recommend top n items Satisfies ε-differential privacy, but… destroys accuracy!

  20. Naïve Approaches Approach 2: Noise-on-Edges • Add Laplace noise independently to each edge, • Run the non-private algorithm with the resulting sanitized preference graph Example: let Noise will destroy accuracy!

  21. Our Approach StrategyS c1 i u2 1 0 ClusterEdges 1 u1 0 u3 1 u4 1 u5 1 u6 0 u8 u7 c2 c3 For now, assume Srandomly assigns edges to clusters

  22. Our Approach c1 i For each cluster, compute noisy average weight u2 1 0 1 u1 0 u3 1 u4 + noise 1 u5 1 u6 0 u8 u7 c2 c3 noise + noise

  23. Our Approach c1 i Replace edge weights w/ noisy average of respective cluster u2 u1 u3 u4 + noise u5 u6 u8 u7 c2 c3 noise + noise

  24. Our Approach i • For every item i • For each user u • Compute μ(i, u) • For each user u • Sort items by utility • Recommend top n items u2 u1 u3 u4 u5 u6 u8 u7

  25. Our Approach: Rationale • Adding/removing a single preference edge affects one cluster average by at most 1/|ci| • Noise added to average for cluster is • The bigger the cluster, the smaller the noise Example: let ε = 0.1, |c| = 50 edges Intuition: the bigger the cluster, the less sensitive its average weight is to any one preference edge

  26. Our Approach: Rationale • The catch – averaging introduces approximation error! • Need a better clustering strategy that will keep approx. error relatively low • Strategy must not leak privacy.

  27. Our Approach: Clustering Strategy c0 Social Graph u2 u2 u1 u1 u3 u3 Community Detection u4 u4 u5 u5 c1 u6 u6 u8 u8 u7 u7 Cluster the users based on the naturalcommunitystructure of the public social graph.

  28. Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 CommunityDetection u4 u5 c1 u6 u8 u7 For each item, derive clusters for preference edges based on the user clusters

  29. Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 Community Detection u4 u5 c1 u6 u8 u7 Note: we only need to cluster the social graph once; resulting clusters used for all items

  30. Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 Community Detection u4 u5 c1 u6 u8 u7 Key point: clustering based on the publicsocial graph does not leak privacy!

  31. Our Approach: Clustering Strategy • Louvain Method [Blondel et al. 2008] • Greedy modularity maximization • Well-studied and known to produce good communities • Fast enough for graphs with millions of nodes • No parameters to tune

  32. Outline • Motivation • Preliminaries • Our Approach • Experimental Results • Conclusions

  33. Data Sets • 1,892 users • 17,632 items • Avg. user deg. = 13.4 (std. 17.3) • Avg. prefsper user = 48.7 (std. 6.9) • 137,372 users • 48,756 items • Avg. user deg. = 18.5 (std. 31.1) • Avg. prefsper user = 54.8 (std. 218.2) Publicly available: Last.fm <http://ir.ii.uam.es/hetrec2011/datasets> Flixster<http://www.sfu.ca/~sja25/datasets>

  34. Measuring Accuracy • Normalized Discounted Cumulative Gain [Järvelin and Kekäläinen. 2002] • NDCG at n – measures quality of the private recommendations relative to non-private recommendations, taking rank and utility into account • Ranges from 0.0 to 1.0, with 1.0 meaning private recommender achieves ideal ranking • Average over all users in data set

  35. Experiments: Last.fm Avg. Accuracy (NDCG at n=50) vs. Privacy Accuracy Privacy High Low

  36. Experiments: Flixster Avg. NDCG at 50; 10,000 random users Accuracy Note: different y-axis scale Privacy High Low

  37. Experiments: Naïve Approaches • Naïve approaches on Last.fm data set Katz Common Graph Adamic-Adar Nbrs. Dist. Katz Common Graph Adamic-Adar Nbrs. Dist.

  38. Conclusions • Differential privacy guarantees for item preferences • Use clustering and averaging to trade Laplace noise for some approx. error • Clustering via the community structure of the social graph is a useful heuristic for clustering the edges without violating privacy • Personalized social recommendations can be both private and accurate

  39. Thank you!

  40. Backup slides

  41. Accuracy Metric: NDCG • Normalized Discounted Cumulative Gain • items recommended to user u by private recommender; sorted by noisy utility • items recommended to user u by non-privaterecommender; sorted by trueutility • NDCG ranges from 0…1 • Averaged over all users in a data set

  42. Social Similarity Measures • Adamic-Adar • Graph Distance • Katz small dam- ping factor paths of length l between u,v

  43. Experiments: Last.fm NDCG at 10 NDCG at 100

  44. Experiments: Flixster NDCG at 10 NDCG at 100

  45. Comparison of approaches on Last.fm data set. Low Rank Mechanism (LRM) – Yuan et al. PVLDB’12 Group and Smooth (GS) – Kellaris & Papadopoulos. PVLDB’13

  46. Relationship between user degree and accuracy, due to approx. error (Common Neighbors).

More Related