1 / 70

Demystifying Dimensionality Reduction

Jeff Hansen Senior Data Engineer. April 2013. Demystifying Dimensionality Reduction. Demystifying Dimensionality Reduction. A Tribute to Johnson and Lindenstrauss. Who is this?. What is this?. How about this?. Hint: It’s for kids…. Some Perspectives are Better than Others.

bunny
Download Presentation

Demystifying Dimensionality Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jeff Hansen Senior Data Engineer April 2013 Demystifying Dimensionality Reduction

  2. Demystifying Dimensionality Reduction A Tribute to Johnson and Lindenstrauss

  3. Who is this?

  4. What is this?

  5. How about this? • Hint: It’s for kids…

  6. Some Perspectives are Better than Others Getting a better look

  7. For kids…

  8. Great, but… • What does this have to do with Machine Learning? • How can this help me visualize my data? • How do I use this to recommend new products to new customers? • Can this help me detect fraud?

  9. Dimensions in Data Going beyond 3-D

  10. Samples and Variables Samples are things. Things have numerous: • Features • Characteristics • Attributes • Variables • aka Dimensions

  11. Examples

  12. Distance and Similarity If we • Treat each feature like a dimension • Treat each item like a point Then • Similar items are closer together • Dissimilar items are further apart

  13. User Group Posts

  14. Measures of Distance Various measures of distance with scary math names: • Euclidean Distance • Maximum Distance • Manhattan Distance • L(n) Norm

  15. Curse of Dimensionality • You think more than 3 dimensions are hard? Try a couple million… • Calculating similarity becomes increasingly difficult as a feature set grows.

  16. What to do??

  17. Reduce the Number of Dimensions Johnson-Lindenstrauss Theorem • Number of Dimensions doesn’t matter, the sample size does – approximate item similarity can be maintained with a number of dimensions on the order of log(n) the number of points. English? • Every time you double the number of points you only need to add a constant number of additional dimensions.

  18. Huh?

  19. This is worth Repeating The number of dimensions doesn’t matter. If all you care about is item similarity, you can project an INFINITE number of dimensions onto a lower number of dimensions based on the number of points you want to compare.

  20. A Graphical Explanation

  21. 3 points in 3 dimensions

  22. Bad Projection

  23. Good Projection

  24. Deeper Meaning

  25. Feature Extraction • What if there were unrecorded variables that explain the variables we can see? • Dimensionality Reduction techniques extractthese hidden variables or features. • For example, Topics explain the appearance of words in documents, Genres explain the movies that people watch.

  26. Sounds Great! But how do I do it?

  27. Singular Value What? Unfortunately, the techniques come with tongue twisting unintuitive names: • SVD – Singular Value Decomposition • PCA – Principle Component Analysis • LSA – Latent Semantic Analysis • LDA – Linear Discriminant Analysis • Random Projections • MinHash

  28. A Brief Refresher of Linear Algebra Don’t Panic!

  29. Vectors and Projections *Image courtesy of Wikipedia: http://en.wikipedia.org/wiki/File:3D_Vector.svg

  30. Vector “dot” Products • A . B = (a1 * b1) + (a2 * b2) + (a3 * b3) • A . B = || A || * || B || * cosθ If B is a unitvector (it has a length of 1) then the result is simply the length of A projected onto the line (or dimension) formed by B. Remember that a “good” projection is one where the angle is close to zero, so that cosθis close to 1 and the dot product of A and B is approximately the length of A. This is like projecting the face of a coin onto a surface that’s parallel to the face of the coin – that would be a good projection.

  31. Matrix Multiplication Cell 1,1 = Row 1 times Column 1 = (a1,1 x b1,1) + (a1,2 x b2,1) + (a1,3 x b3,1) Cell 1,2 = Row 1 times Column 2 = … …

  32. An Easier Way to Remember

  33. The “Cubic” View

  34. Distributing the Workload

  35. The “Layered” View

  36. Matrix Division? What if you could factor a matrix?

  37. Matrix Division? What if you could factor a matrix? You Can! Matrix Decompositions: • LU Decomposition • QR Decomposition • Eigen Decomposition • Singular Value Decomposition

  38. Why would you Want to? 1,000,000 x 1,000,000 = 1,000,000,000,000 100 X 1,000,000 + 100 x 1,000,000 = 200,000,000 That’s a MUCH smaller representation!

  39. Factors as Basis for new Space Suppose Cis a Matrix of people who have watched movies. Every Row represents a person and ever column represents a movie. If we can find matrices A and B where A x B approximates C: • Each row of A models a person • The distance between two rows of A models relative similarity • Each column of B models a movie • The distance between two columns of B models relative similarity

  40. Big Data, Smaller Models Movies … People … … …

  41. SingularValueDecomposition

  42. A = U Σ V* • U and V are square orthonormal matrices – rows and columns are all unit vectors. • Σ is a rectangular diagonal matrix with values decreasing from left to right. • U and V can be viewed as projection matrices, Σ as a scaling matrix. • Earlier columns of U and V* capture most of the “action” of A. • If Σ “decays” quickly enough, most of U and V* is insignificant and can be thrown away without significantly affecting the model.

  43. Using “Cubic” Visualization Dark grey indicates zero or very small values. A U Σ V*

  44. Σ A V* U As columns of U get multiplied by decreasing singular values, the result is smaller column vectors.

  45. A U Σ V*

  46. A U V* Σ

  47. U Σ V* = A Σ V* U A

  48. Reconstituting Cells

More Related