1 / 42

Dimensionality Reduction

Dimensionality Reduction. Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: Documents: Features: Thousands of words, millions of word pairs Surveys – Netflix: 480k users x 177k movies. Dimensionality Reduction. Compress / reduce dimensionality:

Download Presentation

Dimensionality Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dimensionality Reduction

  2. Dimensionality Reduction • High-dimensional == many features • Find concepts/topics/genres: • Documents: • Features: Thousands of words, millions of word pairs • Surveys – Netflix:480k users x 177k movies Slides by Jure Leskovec

  3. Dimensionality Reduction • Compress / reduce dimensionality: • 106rows; 103columns; no updates • random access to any cell(s); small error: OK Slides by Jure Leskovec

  4. Dimensionality Reduction • Assumption: Data lies on or near a low d-dimensional subspace • Axes of this subspace are effective representation of the data Slides by Jure Leskovec

  5. Why Reduce Dimensions? Why reduce dimensions? • Discover hidden correlations/topics • Words that occur commonly together • Remove redundant and noisy features • Not all words are useful • Interpretation and visualization • Easier storage and processing of the data Slides by Jure Leskovec

  6. SVD - Definition A[m x n]= U[m x r][ r x r] (V[n x r])T • A: Input data matrix • mx nmatrix (e.g., mdocuments, nterms) • U: Left singular vectors • mx r matrix (mdocuments, r concepts) • : Singular values • rx r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A) • V: Right singular vectors • nx r matrix (nterms, r concepts) Slides by Jure Leskovec

  7. n m SVD T n VT A   m U Slides by Jure Leskovec

  8. SVD T n 1u1v1 2u2v2 A  + m σi… scalar ui … vector vi … vector Slides by Jure Leskovec

  9. SVD - Properties It is alwayspossible to decompose a real matrix A into A = U VT , where • U, , V: unique • U, V: column orthonormal: • UT U = I; VT V = I (I: identity matrix) • (Cols. are orthogonal unit vectors) • : diagonal • Entries (singular values) are positive, and sorted in decreasing order (σ1σ2 ...  0) Slides by Jure Leskovec

  10. SVD – Example: Users-to-Movies • A = U  VT - example: Matrix Alien Serenity Casablanca Amelie SciFi x x = Romnce Slides by Jure Leskovec

  11. SVD – Example: Users-to-Movies • A = U  VT - example: Matrix Alien Serenity Casablanca Amelie SciFi-concept Romance-concept SciFi x x = Romnce Slides by Jure Leskovec

  12. SVD - Example U is “user-to-concept” similarity matrix • A = U  VT - example: Matrix Alien Serenity Casablanca Amelie SciFi-concept Romance-concept SciFi x x = Romnce Slides by Jure Leskovec

  13. SVD - Example • A = U VT - example: Matrix Alien Serenity Casablanca Amelie ‘strength’ of SciFi-concept SciFi x x = Romnce Slides by Jure Leskovec

  14. SVD - Example • A = U VT - example: V is “movie-to-concept” similarity matrix Matrix Alien Serenity Casablanca Amelie SciFi-concept SciFi x x = Romnce Slides by Jure Leskovec

  15. SVD - Example • A = U VT - example: V is “movie-to-concept” similarity matrix Matrix Alien Serenity Casablanca Amelie SciFi-concept SciFi x x = Romnce Slides by Jure Leskovec

  16. SVD - Interpretation #1 ‘movies’, ‘users’ and ‘concepts’: • U: user-to-concept similarity matrix • V: movie-to-concept sim. matrix • : its diagonal elements: ‘strength’ of each concept Slides by Jure Leskovec

  17. SVD gives best axis to project on: ‘best’ = min sum of squares of projection errors minimum reconstruction error SVD - interpretation #2 Movie 2 rating first right singularvector v1 Movie 1 rating Slides by Jure Leskovec

  18. x x v1 SVD - Interpretation #2 • A = U VT - example: first right singularvector Movie 2 rating v1 Movie 1 rating = Slides by Jure Leskovec

  19. SVD - Interpretation #2 • A = U VT - example: variance (‘spread’) on the v1 axis x x = Slides by Jure Leskovec

  20. x x SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? = Slides by Jure Leskovec

  21. SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? • A:Set the smallest singular values to zero x x = A= Slides by Jure Leskovec

  22. SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A:Set the smallest singular values to zero x x ~ A= Slides by Jure Leskovec

  23. SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A:Set the smallest singular values to zero: x x ~ A= Slides by Jure Leskovec

  24. SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A:Set the smallest singular values to zero: x x ~ A= Slides by Jure Leskovec

  25. SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A:Set the smallest singular values to zero B= Frobenius norm: ǁMǁF= Σij Mij2 ~ A= ǁA-BǁF= Σij(Aij-Bij)2 is “small” Slides by Jure Leskovec

  26. U A Sigma VT = B is approx A B U Sigma = VT Slides by Jure Leskovec

  27. SVD – Best Low Rank Approx. • Theorem: Let A = U  VT(σ1σ2…, rank(A)=r) thenB = U S VT • S = diagonal nxn matrix where si=σi (i=1…k) else si=0 is a best rank-k approximation to A: • Bis solution to minBǁA-BǁFwhere rank(B)=k • We will need 2 facts: • where M= PQR is SVD of M • U  VT - U S VT = U ( - S) VT Slides by Jure Leskovec

  28. SVD – Best Low Rank Approx. • We will need 2 facts: • where M= PQR is SVD of M • U VT -U S VT= U ( - S) VT We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal Slides by Jure Leskovec

  29. SVD – Best Low Rank Approx. • A = U  VT , B = U S VT(σ1σ2…  0, rank(A)=r) • S = diagonal nxn matrix where si=σi (i=1…k) else si=0 then Bis solution to minBǁA-BǁF, rank(B)=k • Why? • We want to choose si to minimize we set si=σi (i=1…k) else si=0 • U  VT - U S VT = U ( - S) VT Slides by Jure Leskovec

  30. SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: σ1 x x = u1 u2 σ2 v1 v2 Slides by Jure Leskovec

  31. u2 u1 vT2 vT1 σ2 σ1 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix m k terms = + +... 1 x m n x 1 n Assume: σ1σ2σ3  ...  0 Why is setting small σsthe thing to do? Vectors ui and vi are unit length, so σiscales them. So, zeroing small σs introduces less error. Slides by Jure Leskovec

  32. u1 u2 vT1 vT2 σ1 σ2 SVD - Interpretation #2 Q: How many σsto keep? A: Rule-of-a thumb: keep 80-90% of ‘energy’(=σi2) m = + +... n assume: σ1σ2σ3  ... Slides by Jure Leskovec

  33. SVD - Complexity • To compute SVD: • O(nm2) or O(n2m) (whichever is less) • But: • Less work, if we just want singular values • or if we want first k singular vectors • or if the matrix is sparse • Implemented in linear algebra packages like • LINPACK, Matlab, SPlus, Mathematica ... Slides by Jure Leskovec

  34. SVD - Conclusions so far • SVD:A= U  VT: unique • U: user-to-concept similarities • V: movie-to-concept similarities •  : strength of each concept • Dimensionality reduction: • keep the few largest singular values (80-90% of ‘energy’) • SVD: picks up linear correlations Slides by Jure Leskovec

  35. Case study: How to query? Q: Find users that like ‘Matrix’ and ‘Alien’ Matrix Alien Serenity Casablanca Amelie SciFi x x = Romnce Slides by Jure Leskovec

  36. Case study: How to query? Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie SciFi x x = Romnce Slides by Jure Leskovec

  37. Case study: How to query? Q: Find users that like ‘Matrix’ A: map query vectors into ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie q Alien v2 q= v1 Project into concept space:Inner product with each ‘concept’ vector vi Matrix Slides by Jure Leskovec

  38. Case study: How to query? Q: Find users that like ‘Matrix’ A: map the vector into ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie q Alien v2 q= v1 q*v1 Project into concept space:Inner product with each ‘concept’ vector vi Matrix Slides by Jure Leskovec

  39. Case study: How to query? Compactly, we have: qconcept = q V E.g.: Matrix Alien Serenity Casablanca Amelie SciFi-concept = q= movie-to-concept similarities Slides by Jure Leskovec

  40. Case study: How to query? How would the user d that rated (‘Alien’, ‘Serenity’) be handled? dconcept = d V E.g.: Matrix Alien Serenity Casablanca Amelie SciFi-concept = d= movie-to-concept similarities Slides by Jure Leskovec

  41. d= Case study: How to query? Observation:User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’! Matrix Alien Serenity Casablanca Amelie SciFi-concept q= Similarity = 0 Similarity ≠ 0 Slides by Jure Leskovec

  42. = SVD: Drawbacks • Optimal low-rank approximation: • in Frobenius norm • Interpretability problem: • A singular vector specifies a linear combination of all input columns or rows • Lack of sparsity: • Singular vectors are dense! VT  U Slides by Jure Leskovec

More Related