1 / 25

Privacy-Preserving Eigentaste-based Collaborative Filtering

Privacy-Preserving Eigentaste-based Collaborative Filtering. Ibrahim Y akut and Huseyin P olat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University , Turkey. Collaborative Filtering (CF). Problem Information Overload. Solution Collaborative Filtering.

merle
Download Presentation

Privacy-Preserving Eigentaste-based Collaborative Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy-Preserving Eigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University, Turkey

  2. CollaborativeFiltering(CF) Problem InformationOverload Solution Collaborative Filtering IWSEC'07

  3. CollaborativeFiltering • Recent technique for filtering and recommendation • Applications • E-commerce • Search engines • Direct recommendations IWSEC'07

  4. Collaborative Filtering Process Item for which prediction is sought i1 i2 iq im u1 u2 Prediction ua Active user un Paq = Prediction on item q for active user IWSEC'07

  5. EigenTaste • Proposed by Goldberg et al in 2001 • The main feature: Online computation in constant time. • Secondly, flexibly usage of several clustering algorithms. • Based on Principal Component Analysis • Application in Jester: online joke recommendation. http://eigentaste.berkeley.edu/ IWSEC'07

  6. EigentasteAlgorithm m items k gauge items Step.1 Find correlation matrix of A Step.2 Find eigenvectors(E) and eigenvalues() of C A: nxk D:nxm n users User-item matrix Correlation Matrix of A IWSEC'07

  7. EigentasteAlgorithmcont’d Step.3 Take first m=2 eigenvectors and project A. x = AEmT = AE2T Step.4 Cluster the projected data using RRC. Recursive Rectangular Clustering(RRC) Step.5 Construct a lookup table with mean of nongauge item ratings for each clusters. IWSEC'07

  8. Eigentaste- online • When active user(a) enters, • Rate the items in gauge set. • Using PCs of his data, a is projected • Find representative cluster • Recommend objects based on preconstructed lookup table. Disapprove Approve IWSEC'07

  9. Motivation • Mentionedalgorithm is succesful • But duetoprivacyrisks, collection of truthfulandtrustworthy data is challenge!!! • Therefore, how can usersgive data for CF purposeswithoutjeopardizingtheirprivacy? • Is it possibletouseperturbed data in Eigentaste-basedalgorithms? IWSEC'07

  10. Modifications on Original • Normalization: • Instead of item mean and std, user mean and std. • Clustering: • Instead of RRC, k-means clustering is used. • Prediction • Instead of look up table directly, denormalize then predict. IWSEC'07

  11. Masking data CF Process Central Database Randomized Pertubation Technique (RPT) Aggrawal&Srikant, 2000 +Rn-1 +Rn +R1 +R2 User1 User2 Usern-1 Usern IWSEC'07

  12. MaskingProcess γθδ • Users and servers agree on γ, θ, δ • Each user u compute z-scores of their ratings • u selects σuover [0, γ] uniformly randomly, use it as std of masking data • u selects ru over [0,1], if ru<= θ, use uniform otherwise gaussian • u selects xerover [0, δ]. %xer of unfilled cells to be filled with noise IWSEC'07

  13. MaskingProcess • u creates munumber of random numbers where • mu= number of rated cell+xer • std=σu, μ=0, gaussian or uniform(√3.σu) wrt ru • Mask his private data by adding this noise data. Here empty cells are selected randomly. IWSEC'07

  14. Eigentaste-based CF withPrivacy • Now server holds disguised user-item matrix, D’and user-gauge matrix A’ • In some steps, the effects of perturbation must be considered and handled! • Correlation matrix construction • Projection • Active user’s entry of gauge set IWSEC'07

  15. CorrelationMatrixConstrction If f≠g means for nondiagonal entries of C’ Expected values 0 0 0 since μ=0 Then IWSEC'07

  16. CorrelationMatrixConstrction If f=g means for diagonal entries of C’ Expected value is 0 since μ=0 Then, assumming n≈n-1 IWSEC'07

  17. Projection Similarly, expected values are 0, then approximated matrix is obtained IWSEC'07

  18. RemainingParts • After determining clusters depending on estimated data • Z-score means of nongauge items are stored in look up table. • When active user, enters disguised gauge ratings the effect of randomization is got rid of by the same way. • The representative cluster is defined, corresponding value from the table denormalized and the prediction is obtained! IWSEC'07

  19. Experiments • Data Set • Jester is a web-based joke data • 17,988 users, 100 jokes • Ratings over a range (-10,+10),continuos • 50% of all ratings are present • Evaluation Metrics p:predicted value r:original value d:size of test set rmax:max rating rmin:min rating IWSEC'07

  20. Eigentaste vs. Modified • 9000 training users, 5000 test users(10 test items) IWSEC'07

  21. Protectingactiveusers’ privacy M1: No disguise, but requires additional cost M2: Just considering gauge mean and std M3: Considering whole mean and std IWSEC'07

  22. Accuracy vs. VaryingNumbers of Users Fix 5000 users and random 10 test items • By increasing number of users, accuracy improves since random numbers will converge to zero • n>=2000, results are satisfying! IWSEC'07

  23. AccuracywithVaryingδValues Accuracyslightlybecomesbetterwithdecreasingδvalues! IWSEC'07

  24. Conclusion • We showed that how to achieve privacy preserving CF tasks using Eigentaste-based algorithms? • We will study • whether we can employ other clustering algorithms • How to improve recommendation qualitiesby using correlation based CF algorithms. IWSEC'07

  25. Thanks for your interests! Questions?

More Related