A Privacy-Preserving Framework for Personalized Social Recommendations

A Privacy-Preserving Framework for Personalized Social Recommendations Zach Jorgensen1 and Ting Yu1,2 1NC State University Raleigh, NC, USA 2Qatar Computing Research Institute Doha, Qatar EDBT March 24-28, 2014 Athens, Greece

Motivation Item Preferences • Social recommendation task – to predict items a user might like based on the items his/her friends like i4 i2 i3 i5 i1 Social Recommendation System recommendations Social Relations

Motivation Model: Top-nSocial Recommender The utility of recommending item ito user u • Input • Items • Users • Social Graph • Preference Graph • # of recs, n • For every item i • For every user u • Compute μ(i, u) • For every user u • Sort items by utility • Recommend top n items Output A personalized list of top n items (by utility), for each user

Motivation = utility of recommending item ito user u μ 1 if pref. exists 0 otherwise u, i u, v e.g., Common Neighbors Social Graph Social similarity measure

Motivation • Many existing structural similarity measures could be used [Survey: Lu & Zhou, 2011] • We considered • Common Neighbors • Adamic-Adar • Graph Distance • Katz

Motivation Two main privacy problems: • Protect privacy of user data from malicious service provider (i.e., the recommender) • Protect privacy of user data from malicious/curious users • Our focus: preventing disclosure of individual item preferences through the output

Motivation Simple attack on Common Neighbors. Bob listens to Bieber! Bob Alice

Motivation Adversary • Knowledge of all preferences except target edge • Observes all recommendations • Knowledge of the algorithm Goal: to deduce the presence/absence of a single preference edge (the target edge)

Motivation Differential Privacy [Dwork, 2006] • Provides strong, formal privacy guarantees • Informally: guarantees that recommendations will be (almost) the same with/without any one preference edge in the input

Motivation Related work: Machanavajjhalaet al. (VLDB 2011) • Task: For each node, recommend node with highest social similarity (Common Neighbors, Katz). • No distinction between user/items or between preferences/social edges. • Negative theoretical results.

Motivation • We assume that social graph is public • Often true in practice… …

Motivation • Main Contribution: a framework that enables differential privacy guarantees for preference edges • Demonstrate on real data sets that making accurate and private social recommendation is feasible

Outline • Motivation • Differential Privacy • Our Approach • Experimental Results • Conclusions

Differential Privacy A randomized algorithm A gives ε-differential privacy if for anyneighboring data sets D, D’ and any : X1 … Xi … Xn X1 … Xi … Xn Neighboring data sets differ ina single record [Dwork, 2006.]

Achieving Differential Privacy X1 … Xi … Xn noised Global sensitivity of A: 1 Theorem: satisfies ε-differential privacy typically Smaller ε = more noise/privacy

Properties of Differential Privacy • Sequential Composition DP Interface D ... ... -differential privacy • Parallel Composition ... ... ε-differentially private

Outline • Motivation • Differential Privacy • Our Approach • Simplifying observations • Naïve Approaches • Our Approach • Experimental Results • Conclusions

Simplifying Observations Iterations use disjoint inputs • For every item i • For every user u • Compute μ(i, u) • For every user u • Sort items by utility • Recommend top n items Post-processing Our focus: an ε-differentially private procedure for computing μ(i, u), for all users uand a giveni

Naïve Approaches Approach 1: Noise-on-Utilities • For each item i • For every user u • Compute • For each user u • Sort items by utility • Recommend top n items Satisfies ε-differential privacy, but… destroys accuracy!

Naïve Approaches Approach 2: Noise-on-Edges • Add Laplace noise independently to each edge, • Run the non-private algorithm with the resulting sanitized preference graph Example: let Noise will destroy accuracy!

Our Approach StrategyS c1 i u2 1 0 ClusterEdges 1 u1 0 u3 1 u4 1 u5 1 u6 0 u8 u7 c2 c3 For now, assume Srandomly assigns edges to clusters

Our Approach c1 i For each cluster, compute noisy average weight u2 1 0 1 u1 0 u3 1 u4 + noise 1 u5 1 u6 0 u8 u7 c2 c3 noise + noise

Our Approach c1 i Replace edge weights w/ noisy average of respective cluster u2 u1 u3 u4 + noise u5 u6 u8 u7 c2 c3 noise + noise

Our Approach i • For every item i • For each user u • Compute μ(i, u) • For each user u • Sort items by utility • Recommend top n items u2 u1 u3 u4 u5 u6 u8 u7

Our Approach: Rationale • Adding/removing a single preference edge affects one cluster average by at most 1/|ci| • Noise added to average for cluster is • The bigger the cluster, the smaller the noise Example: let ε = 0.1, |c| = 50 edges Intuition: the bigger the cluster, the less sensitive its average weight is to any one preference edge

Our Approach: Rationale • The catch – averaging introduces approximation error! • Need a better clustering strategy that will keep approx. error relatively low • Strategy must not leak privacy.

Our Approach: Clustering Strategy c0 Social Graph u2 u2 u1 u1 u3 u3 Community Detection u4 u4 u5 u5 c1 u6 u6 u8 u8 u7 u7 Cluster the users based on the naturalcommunitystructure of the public social graph.

Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 CommunityDetection u4 u5 c1 u6 u8 u7 For each item, derive clusters for preference edges based on the user clusters

Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 Community Detection u4 u5 c1 u6 u8 u7 Note: we only need to cluster the social graph once; resulting clusters used for all items

Our Approach: Clustering Strategy c0 Social Graph u2 u1 u3 Community Detection u4 u5 c1 u6 u8 u7 Key point: clustering based on the publicsocial graph does not leak privacy!

Our Approach: Clustering Strategy • Louvain Method [Blondel et al. 2008] • Greedy modularity maximization • Well-studied and known to produce good communities • Fast enough for graphs with millions of nodes • No parameters to tune

Outline • Motivation • Preliminaries • Our Approach • Experimental Results • Conclusions

Data Sets • 1,892 users • 17,632 items • Avg. user deg. = 13.4 (std. 17.3) • Avg. prefsper user = 48.7 (std. 6.9) • 137,372 users • 48,756 items • Avg. user deg. = 18.5 (std. 31.1) • Avg. prefsper user = 54.8 (std. 218.2) Publicly available: Last.fm <http://ir.ii.uam.es/hetrec2011/datasets> Flixster<http://www.sfu.ca/~sja25/datasets>

Measuring Accuracy • Normalized Discounted Cumulative Gain [Järvelin and Kekäläinen. 2002] • NDCG at n – measures quality of the private recommendations relative to non-private recommendations, taking rank and utility into account • Ranges from 0.0 to 1.0, with 1.0 meaning private recommender achieves ideal ranking • Average over all users in data set

Experiments: Last.fm Avg. Accuracy (NDCG at n=50) vs. Privacy Accuracy Privacy High Low

Experiments: Flixster Avg. NDCG at 50; 10,000 random users Accuracy Note: different y-axis scale Privacy High Low

Experiments: Naïve Approaches • Naïve approaches on Last.fm data set Katz Common Graph Adamic-Adar Nbrs. Dist. Katz Common Graph Adamic-Adar Nbrs. Dist.

Conclusions • Differential privacy guarantees for item preferences • Use clustering and averaging to trade Laplace noise for some approx. error • Clustering via the community structure of the social graph is a useful heuristic for clustering the edges without violating privacy • Personalized social recommendations can be both private and accurate

Thank you!

Backup slides

Accuracy Metric: NDCG • Normalized Discounted Cumulative Gain • items recommended to user u by private recommender; sorted by noisy utility • items recommended to user u by non-privaterecommender; sorted by trueutility • NDCG ranges from 0…1 • Averaged over all users in a data set

Social Similarity Measures • Adamic-Adar • Graph Distance • Katz small dam- ping factor paths of length l between u,v

Experiments: Last.fm NDCG at 10 NDCG at 100

Experiments: Flixster NDCG at 10 NDCG at 100

Comparison of approaches on Last.fm data set. Low Rank Mechanism (LRM) – Yuan et al. PVLDB’12 Group and Smooth (GS) – Kellaris & Papadopoulos. PVLDB’13

Relationship between user degree and accuracy, due to approx. error (Common Neighbors).

A Privacy-Preserving Framework for Personalized Social Recommendations

A Privacy-Preserving Framework for Personalized Social Recommendations

Presentation Transcript

P4P: A Practical Framework for Privacy-Preserving Distributed Computation

Personalized Social Recommendations – Accurate or Private?

Personalized Privacy Protection in Social Networks

A Privacy-Preserving Index for Range Queries

data privacy-preserving

Privacy Preserving Social Plug-ins

A Privacy-Preserving Index for Range Queries

Privacy-preserving DRM

A Privacy Preserving Index for Range Queries

Privacy-Preserving Authentication: A Tutorial

A Flexible, Privacy-Preserving Authentication Framework for Ubiquitous Environments

A Privacy – Preserving Index for Range queries

Privacy-Preserving Computation

Outsourcing Privacy-Preserving Social Networks to a Cloud

A Privacy-Preserving Interdomain Audit Framework

Privacy-Preserving Social Plug-ins

PRIVACY-PRESERVING LOCATION SHARING SERVICES FOR SOCIAL NETWORKS

A Framework for Privacy Enhancing Personalized Web Search

Privacy-Preserving Clustering

A Privacy-Preserving Index for Range Queries

Recommendations For Preserving A Lifestyle Adding Fitness