1 / 65

Probabilistic Models for Matrix Completion Problems

Probabilistic Models for Matrix Completion Problems. Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin Cities. March 11, 2011. Recommendation Systems. Movies. Title: Gone with the wind Release year: 1940

wyatt
Download Presentation

Probabilistic Models for Matrix Completion Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Models for Matrix Completion Problems ArindamBanerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin Cities March 11, 2011

  2. Recommendation Systems Movies Title: Gone with the wind Release year: 1940 Cast: Vivien Leigh, Clark Gable Genre: War, Romance Awards: 8 Oscars Keywords: Love, Civil war … Users Age: 28 Gender: Male Job: Sales man Interest: Travel … Movie ratings matrix Probabilistic Matrix Completion

  3. Advertisements on the Web Category: Sports shoes Brand: Nike Ratings: 4.2/5 … Products … 1% 2% 0.01% … Category: Baby URL: babyearth.com Content: Webpage text Hyperlinks: 0.1% 2% 3% … 2% 2% 0.5% … 0.2% 0.3% 1.5% 2% … Webpages 2.5% 1% … 1.5% 1% 0.04% … Click-Through-Rate matrix Probabilistic Matrix Completion

  4. Forest Ecology Traits Leaf(N) Leaf(P) SLA Leaf-Size … Wood density 2 3 5 … 4 1 2 … 3 3 … Plants 1 1 3 2 … 4 2 1 … 1 1 3 … Plant Trait Matrix (TRY db) (Jens Kattage, Peter Reich, et al)

  5. The Main Idea Probabilistic Matrix Completion

  6. Overview • Graphical Models • Bayesian Networks • Inference • Probabilistic Co-clustering • Structure: Simultaneous Row-Column Clustering • Bayesian models, Inference • Probabilistic Matrix Factorization • Structure: Low Rank Factorization • Bayesian models, Inference Probabilistic Matrix Completion

  7. Graphical Models: What and Why • Statistical Machine Learning • Build diagnostic/predictive models from data • Uncertainty quantification based on (minimal) assumptions • The I.I.D. assumption • Data is independently and identically distributed • Example: Words in a doc drawn i.i.d. from the dictionary • Graphical models • Assume (graphical) dependencies between (random) variables • Closer to reality, domain knowledge can be captured • Learning/inference is much more difficult Probabilistic Matrix Completion

  8. Flavors of Graphical Models • Basic nomenclature • Node = random variable, maybe observed/hidden • Edge = statistical dependency • Two popular flavors: ‘Directed’ and ‘Undirected’ • Directed Graphs • A directed graph between random variables, causal dependencies • Example: Bayesian networks, Hidden Markov Models • Joint distribution is a product of P(child|parents) • Undirected Graphs • An undirected graph between random variables • Example: Markov/Conditional random fields • Joint distribution in terms of potential functions X2 X1 X3 X4 X5 Graphical Models

  9. Bayesian Networks • Joint distribution in terms of P(X|Parents(X)) X2 X1 X3 X4 X5 Probabilistic Matrix Completion

  10. Example I: Burglary Network Probabilistic Matrix Completion

  11. Example II: Rain Network Probabilistic Matrix Completion

  12. Example III: Car Problem Diagnosis Probabilistic Matrix Completion

  13. Latent Variable Models • Bayesian network with hidden variables • Semantically more accurate, less parameters • Example: Compute probability of heart disease Probabilistic Matrix Completion

  14. Inference • Some variables in the Bayes net are observed • the evidence/data, e.g., John has not called, Mary has called • Inference • How to compute value/probability of other variables • Example: What is the probability of Burglary, i.e., P(b|¬j,m) Probabilistic Matrix Completion

  15. Inference Algorithms • Graphs without loops • Efficient exact inference algorithms are possible • Sum-product algorithm, and its special cases • Belief propagation in Bayes nets • Forward-Backward algorithm in Hidden Markov Models (HMMs) • Graphs with loops • Junction tree algorithms • Convert into a graph without loops • May lead to exponentially large graph, inefficient algorithm • Sum-product algorithm, disregarding loops • Active research topic, correct convergence `not guaranteed’ • Works well in practice, e.g., turbo codes • Approximate inference Probabilistic Matrix Completion

  16. Approximate Inference • Variational Inference • Deterministic approximation • Approximate complex true distribution/domain • Replace with family of simple distributions/domains • Use the best approximation in the family • Example: Mean-field, Expectation Propagation • Stochastic Inference • Simple sampling approaches • Markov Chain Monte Carlo methods (MCMC) • Powerful family of methods • Gibbs sampling • Useful special case of MCMC methods Probabilistic Matrix Completion

  17. Overview • Graphical Models • Bayesian Networks • Inference • Probabilistic Co-clustering • Structure: Simultaneous Row-Column Clustering • Bayesian models, Inference • Probabilistic Matrix Factorization • Structure: Low Rank Factorization • Bayesian models, Inference Probabilistic Matrix Completion

  18. Example: Gene Expression Analysis Original Co-clustered Probabilistic Matrix Completion

  19. Co-clustering and Matrix Approximation Probabilistic Matrix Completion

  20. Probabilistic Co-clustering … Row clusters: Column clusters: … Probabilistic Matrix Completion

  21. Generative Process • Assume a mixed membership for each row and column • Assume a Gaussian for each co-cluster • Pick row/column clusters • Generate each entry of the matrix 2 Probabilistic Matrix Completion

  22. Bayesian Co-clustering (BCC) • A Dirichlet distribution over all possible mixed memberships 2 Probabilistic Matrix Completion

  23. Background: Plate Diagrams a a b b1 b2 b3 3 Compact representation of large Bayesian networks Probabilistic Matrix Completion

  24. Bayesian Co-clustering (BCC) Probabilistic Matrix Completion

  25. Recall: The Inference Problem What is P( b | ¬j, m) ? Probabilistic Matrix Completion

  26. Bayesian Co-clustering (BCC) Probabilistic Matrix Completion

  27. Learning: Inference and Estimation • Learning • Estimate model parameters • Infer ‘mixed memberships’ of individual rows and columns • Expectation Maximization (EM) • Issues • Posterior probability cannot be obtained in closed form • Parameter estimation cannot be done directly • Approach:Variational inference Probabilistic Matrix Completion

  28. Variational Inference • Introduce a variational distribution to approximate • Use Jensen’s inequality to get a tractable lower bound • Maximize the lower bound w.r.t. • Alternatively minimize the KL divergence between and • Maximize the lower bound w.r.t. Probabilistic Matrix Completion

  29. Variational EM for BCC = lower bound of log-likelihood Probabilistic Matrix Completion

  30. Residual Bayesian Co-clustering (RBC) • (m1,m2): row/column means • (bm1,bm2): row/column bias • (z1,z2) determines the distribution • Users/movies may have bias Probabilistic Matrix Completion

  31. Results: Datasets • Movielens: Movie recommendation data • 100,000 ratings (1-5) for 1682 movies by 943 users (6.3%) • 1 million ratings for 3900 movies by 6040 users (4.2%) • Foodmart: Transaction data • 164,558 sales records for 7803 customers and 1559 products (1.35%) • Jester: Joke rating data • 100,000 ratings (-10.00,+10.00) for 100 jokes from 1000 users (100%) Probabilistic Matrix Completion

  32. BCC, RBC vs. Co-clustering algorithms • BCC and RBC have the best performance • RBC and RBC-FF perform better than BCC Jester Probabilistic Matrix Completion

  33. RBC vs. Other Co-clustering Algorithms Movielens Foodmart Probabilistic Matrix Completion

  34. RBC vs. SVD, NNMF, and CORR • RBC and RBC-FF are competitive with other algorithms Jester Probabilistic Matrix Completion

  35. RBC vs. SVD, NNMF, and CORR Movielens Foodmart Probabilistic Matrix Completion

  36. SVD vs. Parallel RBC Parallel RBC scales well to large matrices Probabilistic Matrix Completion

  37. Co-embedding: Users Probabilistic Matrix Completion

  38. Co-embedding: Movies Probabilistic Matrix Completion

  39. Overview • Graphical Models • Bayesian Networks • Inference • Probabilistic Co-clustering • Structure: Simultaneous Row-Column Clustering • Bayesian models, Inference • Probabilistic Matrix Factorization • Structure: Low Rank Factorization • Bayesian models, Inference Probabilistic Matrix Completion

  40. Matrix Factorization • Singular value decomposition • Problems • Large matrices, with millions of row/columns • SVD can be rather slow • Sparse matrices, most entries are missing • Traditional approaches cannot handle missing entries ≈ Probabilistic Matrix Completion

  41. Matrix Factorization: “Funk SVD” • Model X ϵRn×m as UVT where • U is a Rn×k, V is Rm×k • Alternatively optimize U and V vj Xij= uiTvj = error = (Xij–Xij)2 = (Xij–uiTvj)2 ^ uiT ^ Probabilistic Matrix Completion

  42. Probabilistic Matrix Factorization (PMF) N(0, σv2I) uiT ~ N(0, σu2I) vj ~ N(0, σv2I) Rij ~ N(uiTvj , σ2) vj Xij~ N(uiTvj , σ2) uiT N(0, σu2I) Inference using gradient descent Probabilistic Matrix Completion R. Salakhutdinov and A. Mnih, NIPS 2007

  43. Bayesian Probabilistic Matrix Factorization µu ~ N(µ0, Λ u), Λ u ~ W(ν0, W0) µv ~ N(µ0, Λ v), Λ v ~ W(ν0, W0) ui ~ N(µu, Λ u) vj ~ N(µv, Λ v) Rij~ N(uiTvj , σ2) N(µv, Λv) vj Xij~ N(uiTvj , σ2) Wishart uiT N(µu, Λu) Gaussian Inference using MCMC Probabilistic Matrix Completion R. Salakhutdinov and A. Mnih, ICML 2008

  44. Parametric PMF (PPMF) • Are the priors used in PMF and BPMF suitable? N(0, σv2I) N(µv, Λv) PMF: Diagonal covariance BPMF: Full covariance, with “hyperprior” vj vj uiT uiT N(0, σu2I) N(µu, Λu) N(µv, Λv) vj Parametric PMF (PPMF): Full covariance, but no “hyperprior” uiT N(µu, Λu) Probabilistic Matrix Completion

  45. PPMF Probabilistic Matrix Completion

  46. PPMF with Mixture Models (MPMF) • What if the row (column) items belong to several groups? Parametric PMF (PPMF): A single Gaussian to generate all ui (or vj) vj N1(µ1u, Λ1u) N2(µ2u, Λ2u) N3(µ3u, Λ3u) uiT Mixture PMF (MPMF): A mixture of Gaussians represent a set of groups. Each ui (or vj) is generated from one of the Gaussians Probabilistic Matrix Completion

  47. MPMF Probabilistic Matrix Completion

  48. PMF with Side Information: LDA-MPMF • Can we use side information to improve accuracy? users side information movies N1(µ1u, Λ1u) p1(θ1u) N2(µ2u, Λ2u) p2(θ2u) LDA-MPMF: ui and side information share a membership vector N3(µ3u, Λ3u) p3(θ3u) Probabilistic Matrix Completion

  49. LDA-MPMF Probabilistic Matrix Completion

  50. PMF with Side Information: CTM-PPMF LDA-MPMF: ui and side information share a membership vector CTM-MPMF: ui is converted to the membership vector to generate side information users side information movies p1(θ1u) p2(θ2u) N(µu, Λu) p3(θ3u) Probabilistic Matrix Completion

More Related