Transfer Learning for Link Prediction

Transfer Learningfor Link Prediction Qiang Yang, 楊強，香港科大 Hong Kong University of Science and Technology Hong Kong, China http://www.cse.ust.hk/~qyang

We often find it easier …

We often find it easier…

Transfer Learning?迁移学习… • People often transfer knowledge to novel situations • Chess  Checkers • C++  Java • Physics  Computer Science Transfer Learning: The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains)

But, direct application will not work Machine Learning: Training and future (test) data follow the same distribution, and are in same feature space

When distributions are different Classification Accuracy (+, -) • Training: 90% • Testing: 60%

When features and distributions are different Classification Accuracy (+, -) Training: 90% Testing: 10% 9

Cross-Domain Sentiment Classification Training Test Classifier 82.55% DVD DVD Training Test 71.85% Classifier Electronics DVD

Wireless sensor networks Different time periods, devices or space When distributions are different Device, Space, or Time Day time Night time Device 2 Device 1

When Features are different • Heterogeneous: different feature spaces Training: Text Future: Images The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ... Apples Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ... Bananas

Transfer Learning: Source Domains Learning Input Output Source Domains

An overview of various settings of transfer learning Self-taught Learning 源數據 Case 1 No labeled data in a source domain Inductive Transfer Learning 目標數據 Labeled data are available in a source domain Labeled data are available in a target domain Source and target tasks are learnt simultaneously Multi-task Learning Case 2 Transfer Learning Labeled data are available only in a source domain Assumption: different domains but single task Transductive Transfer Learning Domain Adaptation No labeled data in both source and target domain Assumption: single domain and single task Unsupervised Transfer Learning Sample Selection Bias /Covariance Shift

Talk Outline • Transfer Learning: A quick introduction • Link prediction and collaborative filtering problems • Transfer Learning for Sparsity Problem • Codebook Method • CST Method • Collective Link Prediction Method • Conclusions and Future Works 15

Network Data • Social Networks • Facebook, Twitter, Live messenger, etc. • Information Networks • The Web (1.0 with hyperlinks and 2.0 with tags) • Citation network, Wikipedia, etc. • Biological Networks • Gene network, etc. • Other Networks

A Real World Study [Leskovec-Horvitz WWW ‘08] • Who talks to whom on MSN messenger • Network: 240M nodes, 1.3 billion edges • Conclusions: • Average path length is 6.6 • 90% of nodes is reachable <8 steps

Network Structures: Links • Global network structures • Small world • Long tail distributions • Local network structures • Relational learning (relation==link) • Link prediction (e.g. recommendation) • Group (Cluster) structures

Global Network Structures • Small world • Six degrees of separation [Milgram 60s] • Clustering and communities • Evolutionary patterns • Long tail distributions • Personalized recommendation: Amazon • Also brings a challenge in recommendation and relational learning: sparseness

Local Network Structures • Link Prediction • A form of Statistical Relational Learning • Object classification: predict category of an object based on its attributes and links • Is this person a student? • Link classification: predict type of a link • Are they co-authors? • Link existence: predict whether a link exists or not • Does he like this book? • Link cardinality estimation: predict the number of links of a node • An Introduction to statistical relational learning by Taskar and Getoor ? ? ? ? ? ?

Link Prediction • Task: predict missing links in a network • Main challenge: Network Data Sparsity

Product Recommendation (Amazon.com) 22 22

Examples: Collaborative Filtering Ideal Setting Training Data: Dense Rating Matrix Density >=2.0%, Test Data: Rating Matrix CF model RMSE:0.8721

Examples: Collaborative Filtering 10% Realistic Setting Training Data: Sparse Rating Matrix Density <=0.6%, Test Data: Rating Matrix CF model RMSE:0.9748

Transfer Learning for Collaborative Filtering? IMDB Database Amazon.com 26 26

Codebook Transfer Bin Li, Qiang Yang, Xiangyang Xue. Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI '09), Pasadena, CA, USA, July 11-17, 2009.

Limitations of Codebook Transfer • Same rating range • Source and target data must have the same range of ratings [1, 5] • Homogenous dimensions • User and item dimensions must be similar • In reality • Range of ratings can be 0/1 or [1,5] • User and item dimensions may be very different

Coordinate System Transfer Weike Pan, Evan Xiang, Nathan Liu and Qiang Yang. Transfer Learning in Collaborative Filtering for Sparsity Reduction. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI-10). Atlanta, Georgia, USA. July 11-15, 2010.

Types of User Feedback • Explicit feedback: express preferences explicitly • 1-5: Rating • ?: Missing • Implicit feedback: express preferences implicitly • 1: Observed browsing/purchasing • 0: Unobserved browsing/purchasing

Target domain • R: explicit ratings • One auxiliary domain (two or more data sources) • : user u has implicitly browsed/purchased item j

Problem Formulation • One target domain • , n users and m items. • One auxiliary domain (two data sources) • user side: , shares n common users with R. • item side: , shares m common items with R. • Goal: predict the missing values in R.

Our Solution:Coordinate System Transfer • Step 1: Coordinate System Construction ( ) • Step 2: Coordinate System Adaptation

Coordinate System Transfer Step 1: Coordinate System Construction • Sparse SVD on auxiliary data where the matrices give the top d principle coordinates. Definition (Coordinate System): • A coordinate system is a matrix with columns of orthonormal bases (principle coordinates), where the columns are located in descending order according to their corresponding eigenvalues.

Coordinate System Transfer Step 1: Coordinate System Adaptation • Adapt the discovered coordinate systems from the auxiliary domain to the target domain, • The effect from the auxiliary domain • Initialization: take as seed model in target domain, • Regularization:

Coordinate System Transfer Algorithm

Experimental Results Data Sets and Evaluation Metrics • Data sets (extracted from MovieLens and Netflix) • Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), Where and are the true and predicted ratings, respectively, and is the number of test ratings.

Experimental Results Baselines and Parameter Settings • Baselines • Average Filling • Latent Matrix Factorization (Bell and Koren, ICDM07) • Collective Matrix Factorization (Singh and Gordon, KDD08) • OptSpace (Keshavan, Montanari and Oh, NIPS10) • Parameter settings • Tradeoff parameters • LFM: { 0.001, 0.01, 0.1, 1, 10, 100 } • CMF: { 0.1, 0.5, 1, 5, 10 } • CST: • Latent dimensions: { 5, 10, 15 } • Average results over 10 random trials are reported

Experimental Results Results(1/2) • Observations: • CST performs 99% significantly better (t-test) than all baselines in all sparsity levels, • Transfer learning methods (CST, CMF) beats two non-transfer learning methods (AF, LFM).

Experimental Results Results(2/2) • Observations: • CST consistently improves as d increases, demonstrating the usefulness (avoid overfitting) using the auxiliary knowledge, • The improvement (CST vs. OptSpace) is more significant with fewer observed ratings, demonstrating the effectiveness of CST in alleviating data sparsity.

Related Work Related Work (Transfer Learning) • Cross-domain collaborative filtering • Domain adaptation methods • One-sided: • Two-sided:

Transfer Learning for Link Prediction

Transfer Learning for Link Prediction

Presentation Transcript

Transfer of Learning

CoupledLP : Link Prediction in Coupled Networks

Transfer to Learning

TRANSFER OF LEARNING

Efficient Decomposed Learning for Structured Prediction

Nonparametric Link Prediction in Dynamic Graphs

Transfer Learning

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

Theoretical Justification for Popular Link Prediction Heuristics

Theoretical Justification for Popular Link Prediction Heuristics

Transfer for Supervised Learning Tasks

Transfer of Learning

My Learning Link

Transfer Learning for Image Classification

Learning to Link

Nonparametric Latent Feature Models for Link Prediction

Assessing For Learning and Transfer

Learning for Transfer

Transfer Learning for Link Prediction