1 / 37

CS246

CS246. Topic-Based Models. Motivation. Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector model? Q: Is it desirable? Q: What can we do?. Topic-Based Models. Index documents based on “topics” not by individual terms

astin
Download Presentation

CS246

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS246 Topic-Based Models

  2. Motivation • Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector model? • Q: Is it desirable? • Q: What can we do?

  3. Topic-Based Models • Index documents based on “topics” not by individual terms • Return a document if it shares the same topic with the query • We can return a document with “automobile” for the query “car” • Much fewer “topics” than “terms” • Topic-based index can be more compact than term-based index

  4. Example (1) • Two topics: “Car”, “Movies”Four terms: car, automobile, movie, theater • Topic-term matrix • Document-topic matrix

  5. Example (2) • But what we have is document-term matrix!!! • How are the three matrices related?

  6. Linearity Assumption • A document is generated as a topic-weighted linear combination of topic-term vectors • A simplifying assumption on document generation doc1 = 0 (1,0.9, 0,0) + 1 (0,0,1,0.8) = ( 0, 0, 1, 0.8) doc3 = 0.8 (1,0.9, 0,0) + 0.2 (0,0,1,0.8) = (0.8,0.72, 0.2, 0.16)

  7. Topic-Based Index as Matrix Decomposition

  8. Topic-Based Index as Matrix Decomposition • # topics << # terms, # topics << # docs • Decompose (doc-term) matrix to two matrices of rank-K (K: # topics) • Of course, decomposition will be approximate for real data topic term term topic X = doc doc

  9. Topic-Based Index as Rank-K Approximation • Q: How to choose the two decomposed matrices? What is the “best” decomposition? • Latent Semantic Index (LSI) • Find the decomposition that is the “closest” to the original matrix • Singular-Value Decomposition (SVD) • A decomposition method that leads to the best rank-K approximation • We will spend the next few hours to learn about SVD and its meaning • Basic understanding of linear algebra will be very useful for both IR and datamining

  10. A Brief Review of Linear Algebra • Vector and a list of numbers • Addition • Scalar multiplication • Dot product • Dot product as a projection • Q: (1, 0) vs (0, 1). Are they the same vectors? • A: Choice of basis determines the “meaning” of the numbers • Matrix • Matrix multiplication • Four ways to look at matrix multiplication • Matrix as vector transformation

  11. Change of Coordinates (1) • Two coordinate systems • Q: What are the coordinates of (2,0) under the second coordinate system? • Q: What about (1,1)?

  12. Change of Coordinates (2) • In general, we get the new coordinates of a vector under the new basis vectors by multiplying the original coordinates with the following matrix • Verify with previous example • Q: What does the above matrix look like? How can we identify a coordinate-change matrix?

  13. Matrix and Change of Coordinates • vectors are orthonormal to each other • Orthonormal matrix: • An orthonormal matrix can be interpreted as change-of-coordinate transformation • The rows of the matrix Q are the new basis vectors

  14. Linear Transformation • Linear transformation • Every linear transformation can be represented as a matrix • By selecting appropriate basis vectors • Matrix form of a linear transformation can be obtained simply by learning how the basis vectors transform • Verify with 45 degree rotation. • What transformations are possible for linear transformation?

  15. Linear Transformation that We Know • Rotation • Stretching • Anything else? • Claim: Any linear transformation is a stretching followed by a rotation • “Meaning” of singular value decomposition • An important result of linear algebra • Let us learn why this is the case

  16. Rotation • Matrix form of rotation? What property will it have? Remember • Rotation matrix R <=> Orthonormal matrix • ’s are unit basis vectors as well • Orthonormal matrix • Change of coordinates • Rotation

  17. Stretching (1) • Q: Matrix form of stretching by 3 along x, y, z axes in 3D? • Q: Matrix form of stretching by 3 along x axis and by 2 along y axis in 3D. • Q: Stretching matrix <=> diagonal matrix?

  18. Stretching (2) • Q: Matrix form of stretching by 3 along and by 2 along ? • Verify by transforming (1,1) and (-1, 1) • Decomposition of T = Q T’ QT shows the transformation in a different coordinate system • Under the matrix form, the simplicity of the stretching transformation may not be obvious • Q: What if we chose as the basis?

  19. Stretching (3) • Under a good choice of basis vectors, orthogonal-stretching transformation can always be represented as a diagonal matrix • Q: How can we tell whether a matrix corresponds to an orthogonal-stretching transformation?

  20. Stretching – Orthogonal Stretching (1) • Remember that this is orthogonal-stretching along • If a transformation is orthogonal stretching, we should always be able to represent it as QDQT for some Q, where Q shows the stretching axes • Q: What is the matrix form of the transformation that stretches by 5 along (4/5, 3/5) and by 4 along (-3/5, 4/5)?

  21. Stretching – Orthogonal Stretching (2) • Q: Given a matrix, how do we know whether it is orthogonal-stretching? • A: When it can be decomposed to T = QDQT • A: Spectral Theorem • Any symmetric matrix T can always be decomposed into T = QDQT • Symmetric matrix <=> orthogonal stretching • Q: How can we decompose T to QDQT? • A: If T stretches along X, then TX = X for some . • X: eigenvector of T •  : eigenvalue of T • Solve the equation for  and X

  22. Eigen Values, Eigen Vectors and Orthogonal Stretching • Eigenvector: stretching axis • Eigenvalue: stretching factor • All eigenvectors are orthogonal<=> Orthogonal stretching<=> Symmetric matrix (spectral theorem) • Example • Q: What transformation is this?

  23. Singular Value Decomposition (SVD) • Any linear transformation T can be decomposed toT = R S (R: rotation, S: orthogonal stretching) • One of the basic results of linear algebra • In matrix form, any matrix T can be decomposed to • Diagonal entries in D: singular values • Example Q: What transformation is this?

  24. Singular Value Decomposition (2) • Q: For (n x m) matrix T, what will be the dimension of the three matrices after SVD? • Q: What is the meaning of non-square diagonal matrix? • The diagonal matrix is also responsible for projection (or dimension padding).

  25. Singular Values vs Eigenvalues • Q: What is this transformation? • A: Q1 – eigenvectors of TTT D – square root of eigenvalues of TTT.Similarly, Q2 – eigenvectors of TTT D – square root of eigenvalues of TTT. • SVD can be done by computing eigenvalues and eigenvectors of TTT and TTT

  26. SVD as Matrix Approximation • Q: If we want to reduce the rank of T to 2, what will be a good choice? • The best rank-k approximation of any matrix T is to keep the first-k entries of its SVD.

  27. SVD Approximation Example:1000 x 1000 matrix with (0…255)

  28. Image of original matrix 1000x1000

  29. SVD. Rank 1 approximation

  30. SVD. Rank 10 approximation

  31. SVD. Rank 100 approximation

  32. Original vs Rank 100 approximation Q: How many numbers do we keep for each?

  33. Back to LSI topic term • LSI: decompose (doc-term) matrix to two matrices of rank-K • Our goal is to find the “best” rank-K approximation • Apply SVD, keep the top-K singular values, meaning that we keep the first K column and the first K rows of the first and third matrix after SVD. term topic X = doc doc

  34. LSI and SVD • LSI term term topic topic = X doc doc • SVD term = doc

  35. LSI and SVD • LSI summary • Formulate the topic-based indexing problem as rank-K matrix approximation problem • Use SVD to find the best rank-K approximation • When applied to real data, 10-20% improvement reported • Using LSI was the road to fame for Excite in early days

  36. Limitations of LSI • Q: Any problems with LSI? • Problems with LSI • Scalability • SVD is known to be difficult to perform for a large data • Interpretability • Extracted document-topic matrix is impossible to interpret • Difficult to understand why we get good/bad results from LSI for some queries • Q: Any way to develop more interpretable topic-based indexing? • Topic for next lecture

  37. Summary • Topic-based indexing • Synonym and polyseme problem • Index documents by topic, not by terms • Latent Semantic Index (LSI) • Document is a linear combination of its topic vector and the topic-term vectors • Formulate the problem as a rank-K matrix approximation problem • Uses SVD to find the best approximation • Basic linear algebra • Linear transformation, matrix, stretching and rotation • Orthogonal stretching, diagonal matrix, symmetric matrix, eigenvalues and eigenvectors • Rotation, change of coordinate, and orthonormal matrix • SVD and its implication as a linear transformation

More Related