1 / 58

Transfer Learning for Link Prediction

Transfer Learning for Link Prediction. Qiang Yang, 楊強,香港科大 Hong Kong University of Science and Technology Hong Kong, China http://www.cse.ust.hk/~qyang. We often find it easier …. We often find it easier…. We often find it easier…. We often find it easier…. Transfer Learning? 迁移学习 ….

Download Presentation

Transfer Learning for Link Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transfer Learningfor Link Prediction Qiang Yang, 楊強,香港科大 Hong Kong University of Science and Technology Hong Kong, China http://www.cse.ust.hk/~qyang

  2. We often find it easier …

  3. We often find it easier…

  4. We often find it easier…

  5. We often find it easier…

  6. Transfer Learning?迁移学习… • People often transfer knowledge to novel situations • Chess  Checkers • C++  Java • Physics  Computer Science Transfer Learning: The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains)

  7. But, direct application will not work Machine Learning: Training and future (test) data follow the same distribution, and are in same feature space

  8. When distributions are different Classification Accuracy (+, -) • Training: 90% • Testing: 60%

  9. When features and distributions are different Classification Accuracy (+, -) Training: 90% Testing: 10% 9

  10. When Features are different • Heterogeneous: different feature spaces Training: Text Future: Images The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ... Apples Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ... Bananas

  11. Transfer Learning: Source Domains Learning Input Output Source Domains

  12. An overview of various settings of transfer learning Self-taught Learning 源數據 Case 1 No labeled data in a source domain Inductive Transfer Learning 目標數據 Labeled data are available in a source domain Labeled data are available in a target domain Source and target tasks are learnt simultaneously Multi-task Learning Case 2 Transfer Learning Labeled data are available only in a source domain Assumption: different domains but single task Transductive Transfer Learning Domain Adaptation No labeled data in both source and target domain Assumption: single domain and single task Unsupervised Transfer Learning Sample Selection Bias /Covariance Shift

  13. Feature-based Transfer Learning(Dai, Yang et al. ACM KDD 2007) Source: Many labeled instances Target: All unlabeled instances Distributions Feature spaces can be different, but have overlap Same classes P(X,Y): different! bridge CoCC Algorithm (Co-clustering based)

  14. Document-word co-occurrence Di Source Knowledge transfer Target Do

  15. Co-Clustering based Classification (on 20 News Group dataset) Using Transfer Learning

  16. Talk Outline • Transfer Learning: A quick introduction • Link prediction and collaborative filtering problems • Transfer Learning for Sparsity Problem • Codebook Method • CST Method • Collective Link Prediction Method • Conclusions and Future Works 18

  17. Network Data • Social Networks • Facebook, Twitter, Live messenger, etc. • Information Networks • The Web (1.0 with hyperlinks and 2.0 with tags) • Citation network, Wikipedia, etc. • Biological Networks • Gene network, etc. • Other Networks

  18. A Real World Study [Leskovec-Horvitz WWW ‘08] • Who talks to whom on MSN messenger • Network: 240M nodes, 1.3 billion edges • Conclusions: • Average path length is 6.6 • 90% of nodes is reachable <8 steps

  19. Global Network Structures • Small world • Six degrees of separation [Milgram 60s] • Clustering and communities • Evolutionary patterns • Long tail distributions

  20. Local Network Structures • Link Prediction • A form of Statistical Relational Learning (Taskar and Getoor) • Object classification: predict category of an object based on its attributes and links • Is this person a student? • Link classification: predict type of a link • Are they co-authors? • Link existence: predict whether a link exists or not (credit: Jure Leskovec, ICML ‘09)

  21. Link Prediction • Task: predict missing links in a network • Main challenge: Network Data Sparsity

  22. Long Tail in Era of Recommendation • Help users discover novel/rare items • The long-tail  recommendation systems

  23. Product Recommendation (Amazon.com) 25 25

  24. Collaborative Filtering (CF) • Wisdom of the crowd, word of mouth,… • Your taste in music/products/movies/… is similar to that of your friends(Maltz & Ehrlich, 1995) • Key issue: who are your friends?

  25. Taobao.com

  26. Matrix Factorization model for Link Prediction • We are seeking a low rank approximation for our target matrix • Such that the unknown value can be predicted by Low rank Approximation x m x k k x n m x n

  27. Data Sparsity in Collaborative Filtering 10% Realistic Setting Training Data: Sparse Rating Matrix Density <=0.6%, Test Data: Rating Matrix CF model RMSE:0.9748

  28. Transfer Learning for Collaborative Filtering? IMDB Database Amazon.com 36 36

  29. Codebook Transfer Bin Li, Qiang Yang, Xiangyang Xue. Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI '09), Pasadena, CA, USA, July 11-17, 2009.

  30. Codebook Construction • Definition 2.1 (Codebook). A k × lmatrix which compresses the cluster-level rating patterns of kuser clusters and litem clusters. • Codebook: User prototypes rate on item prototypes • Encoding: Find prototypes for users and items and get indices • Decoding: Recover rating matrix based on codebook and indices

  31. Knowledge Sharing via Cluster-Level Rating Matrix • Source (Dense): Encode cluster-level rating patterns • Target (Sparse): Map users/items to the encoded prototypes

  32. Step 1: Codebook Construction • Co-cluster rows (users) and columns (items) in Xaux • Get user/item cluster indicators Uaux∈ {0, 1}n×k, Vaux∈ {0, 1}m×l

  33. Step 2: Codebook Transfer • Objective Expand target matrix, while minimizing the difference between Xtgt and the reconstructed one • User/item cluster indicators Utgtand Vtgtfor Xtgt • Binary weighting matrix Wfor observed ratings in Xtgt • Alternate greedy searches for Utgt and Vtgt to a local minimum

  34. Codebook Transfer • Each user/item in Xtgtmatches to a prototype in B • Duplicate certain rows & columns in Bto reconstruct Xtgt • Codebook is indeed a two-sided data representation

  35. Experimental Setup • Data Sets • EachMovie (Auxiliary): 500 users × 500 movies • MovieLens (Target): 500 users × 1000 movies • Book-Crossing (Target): 500 users × 1000 books • Compared Methods • Pearson Correlation Coefficients (PCC) • Scalable Cluster-based Smoothing (CBS) • Weighted Low-rank Approximation (WLR) • Codebook Transfer (CBT) • Evaluation Protocol • First 100/200/300 users for training; last 200 users for testing • Given 5/10/15 observable ratings for each test user

  36. Experimental Results (1): Books  Movies • MAE Comparison on MovieLens • average over 10 sampled test sets • Lower is better

  37. Limitations of Codebook Transfer • Same rating range • Source and target data must have the same range of ratings [1, 5] • Homogenous dimensions • User and item dimensions must be similar • In reality • Range of ratings can be 0/1 or [1,5] • User and item dimensions may be very different

  38. Coordinate System Transfer Weike Pan, Evan Xiang, Nathan Liu and Qiang Yang. Transfer Learning in Collaborative Filtering for Sparsity Reduction.  In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI-10). Atlanta, Georgia, USA. July 11-15, 2010.

  39. Our Solution: Coordinate System Transfer • Step 1: Coordinate System Construction ( ) • Step 2: Coordinate System Adaptation

  40. Coordinate System Transfer Step 1: Coordinate System Adaptation • Adapt the discovered coordinate systems from the auxiliary domain to the target domain, • The effect from the auxiliary domain • Initialization: take as seed model in target domain, • Regularization:

  41. Coordinate System Transfer Algorithm

  42. Experimental Results Data Sets and Evaluation Metrics • Data sets (extracted from MovieLens and Netflix) • Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), Where and are the true and predicted ratings, respectively, and is the number of test ratings.

  43. Experimental Results Baselines and Parameter Settings • Baselines • Average Filling • Latent Matrix Factorization (Bell and Koren, ICDM07) • Collective Matrix Factorization (Singh and Gordon, KDD08) • OptSpace (Keshavan, Montanari and Oh, NIPS10) • Average results over 10 random trials are reported

  44. Results(1/2) • Observations: • CST performs significantly better (t-test) than all baselines at all sparsity levels, • Transfer learning methods (CST, CMF) beat two non-transfer learning methods (AF, LFM).

  45. Limitation of CST and CBT • Different source domains are related to the target domain differently • Book to Movies • Food to Movies • Rating bias • Users tend to rate items that they like • Thus there are more rating = 5 than rating = 2

  46. Our Solution: Collective Link Prediction (CLP) • Jointly learn multiple domains together • Learning the similarity of different domains • consistency between domains indicates similarity. • Introduce a link function to correct the bias

  47. Bin Cao, Nathan Liu and Qiang Yang. Transfer Learning for Collective Link Prediction in Multiple Heterogeneous Domains. In Proceedings of 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel. June 2010.

  48. Example: Collective Link Prediction Sparse Training Movie Rating Matrix Dense Auxiliary Book Rating Matrix Different Rating Domains Training Transfer Learning RMSE:0.8855 Different User behaviors Predicting Dense Auxiliary Movie Tagging Matrix Test Data: Rating Matrix

  49. Inter-task Similarity • Based on Gaussian process models • Key part is the kernel modeling user relation as well as task relation

More Related