1 / 24

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

Cao et al. ICML 2010 Presented by Danushka Bollegala. Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains. Link Prediction. Predict links (relations) between entities Recommend items for users ( MovieLens , Amazon)

naava
Download Presentation

Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cao et al. ICML 2010 Presented by Danushka Bollegala. Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains

  2. Link Prediction • Predict links (relations) between entities • Recommend items for users (MovieLens, Amazon) • Recommend users for users (social recommendation) • Similarity search (suggest similar web pages) • Query suggestion (suggest related queries by other users) • Collective Link Prediction (CLP) • Perform multiple prediction tasks for the same set of users simultaneously • Predict/recommend multiple item types (books and movies) • Pros • Prediction tasks might not be independent, one can benefit from another (books vs. movies vs. food) • Less affected by data sparseness (cold start problem)

  3. Link prediction = matrix factorization Probabilistic Principal Component Analysis (PPCA) (Bishop & Tipping, 1999) PRML Chapter 12. Probabilistic non-linear matrix factorization Lawrence & Utrasun, ICML 2009 Task similarity Matrix, T Gaussian Process for Regression (GPR) (PRML Sec. 6.4) Transfer Learning+ Collective Link Prediction (this paper)

  4. Link Modeling via NMF • Link matrix X (xi,j is the rating given by user I to item j) • Xi,j is modeled by f(ui, vj, ε) • f: link function • ui: latent representation of a user i • vj: latent representation of an item j • ε: noise term • Generalized matrix approximation • Assumption: E is Gaussian noise N(0, σ2I) • Use Y = f-1(X) • Then, Y follows a multivariate Gaussian distribution.

  5. Revision (PRML Section 6.4) Gaussian Process Regression

  6. Functions as Vectors • We can view a function as an infinite dimensional vector • f(x): (f(x1), f(x2),...)T • Each point in the domain is mapped by f to a dimension in the vector • In machine learning we must find functions (e.g. linear predictors) that map input values to their corresponding output values • We must also avoid over-fitting • This can be visualized as sampling from a distribution over functions with certain properties • Preference bias (cf. restriction bias)

  7. Gaussian Process (GP) (1/2) • Linear regression model • We get different output functions y for different weight vectors w. • Let us impose a Gaussian prior over w • Train dataset: {(x1,y1),...,(xN,yN)} • Targets: y=(y1,...,yN)T • Design matrix

  8. Gaussian Process (2/2) • When we impose a Gaussian prior over the weight vector, then the target y is also Gaussian. • K: Kernel matrix (Gram matrix) • k: kernel function

  9. Gaussian Process: Definition • Gaussian process is defined as a probability distribution over functions y(x) such that the set of values y(x) evaluated at an arbitrary set of points x1,...,xN jointly have a Gaussian distribution. • p(x1,...,xN) is Gaussian. • Often the mean is set to zero • Non-informative prior • Then the kernel function fully defines the GP. • Gaussian kernel: • Exponential Kernel:

  10. Gaussian Process Regression (GPR) • Predict outputs with noise x y t e

  11. Probabilistic Matrix Factorization • PMF can be seen as a Gaussian Process with latent variables (GP-LVM) [Lawrence & Utrasun ICML 2009] Generalized matrix approximation model Y=f-1(X) follows a multivariate Gaussian distribution A Gaussian prior is set on U Probabilistic PCA model by Tipping & Bishop (1999) Non-linear version Mapping back to X

  12. Ratings are not Gaussian!

  13. Collective Link Prediction • GP model for each task • A single model for all tasks

  14. Tensor Product • Known as Kronecker product for two matrices (e.g., numpy,kron(a,b))

  15. Generalized Link Functions • Each task might have a different rating distribution. • c, α, b are parameters that must be estimated from the data. • We can relax the constraint α > 0 if we have no prior knowledge regarding the negativity of the skewness of the rating distribution.

  16. Predictive distribution • Similar to GPR prediction • Predicting y= g(x) • Predicting x

  17. Parameter Estimation • Compute the likelihood of the dataset • Use Stochastic Gradient Descent for optimization • Non-convex optimization • Sensitive to initial conditions

  18. Experiments • Setting • Use each dataset and predict multiple items • Datasets • MovieLens • 100000 ratings, 1-5 scale ratings, 943 users, 1682 movies, 5 popular genres • Book-Crossing • 56148 ratings, 1-10 scale, 28503 users, 9909 books, 4 most general Amazon book categories • Douban • A social network-based recommendation serivce • 10000 users, 200000 items • Movies, books, music

  19. Evaluation • Evaluation measure • Mean Absolute Error (MAE) • Baselines • I-GP: Independent Link Prediction using GP • CMF: Collective matrix factorization • non GP, classical NMF • M-GP: Joint Link prediction using multi-relational GP • Does not consider the similarity between tasks • Proposed method = CLP-GP

  20. Results Note: (1) Smaller values are better (2) with(+)/without(-) link function.

  21. Total data sparseness Good

  22. Target task data sparseness

  23. Task similarity matrix (T) • Romance and Drama are very similar • Action and Comedy are very dissimilar

  24. My Comments • Elegant model and well-written paper • Few parameters (latent space dimension k) need to be specified • All other parameters can be learnt • Applicable to a wide range of tasks • Cons: • Computational complexity • Predictions require kernel matrix inversion • SGD updates might not converge • The problem is non-convex...

More Related