CS246

CS246 Topic-Based Models

Motivation • Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector model? • Q: Is it desirable? • Q: What can we do?

Topic-Based Models • Index documents based on “topics” not by individual terms • Return a document if it shares the same topic with the query • We can return a document with “automobile” for the query “car” • Much fewer “topics” than “terms” • Topic-based index can be more compact than term-based index

Example (1) • Two topics: “Car”, “Movies”Four terms: car, automobile, movie, theater • Topic-term matrix • Document-topic matrix

Example (2) • But what we have is document-term matrix!!! • How are the three matrices related?

Linearity Assumption • A document is generated as a topic-weighted linear combination of topic-term vectors • A simplifying assumption on document generation doc1 = 0 (1,0.9, 0,0) + 1 (0,0,1,0.8) = ( 0, 0, 1, 0.8) doc3 = 0.8 (1,0.9, 0,0) + 0.2 (0,0,1,0.8) = (0.8,0.72, 0.2, 0.16)

Topic-Based Index as Matrix Decomposition

Topic-Based Index as Matrix Decomposition • # topics << # terms, # topics << # docs • Decompose (doc-term) matrix to two matrices of rank-K (K: # topics) • Of course, decomposition will be approximate for real data topic term term topic X = doc doc

Topic-Based Index as Rank-K Approximation • Q: How to choose the two decomposed matrices? What is the “best” decomposition? • Latent Semantic Index (LSI) • Find the decomposition that is the “closest” to the original matrix • Singular-Value Decomposition (SVD) • A decomposition method that leads to the best rank-K approximation • We will spend the next few hours to learn about SVD and its meaning • Basic understanding of linear algebra will be very useful for both IR and datamining

A Brief Review of Linear Algebra • Vector and a list of numbers • Addition • Scalar multiplication • Dot product • Dot product as a projection • Q: (1, 0) vs (0, 1). Are they the same vectors? • A: Choice of basis determines the “meaning” of the numbers • Matrix • Matrix multiplication • Four ways to look at matrix multiplication • Matrix as vector transformation

Change of Coordinates (1) • Two coordinate systems • Q: What are the coordinates of (2,0) under the second coordinate system? • Q: What about (1,1)?

Change of Coordinates (2) • In general, we get the new coordinates of a vector under the new basis vectors by multiplying the original coordinates with the following matrix • Verify with previous example • Q: What does the above matrix look like? How can we identify a coordinate-change matrix?

Matrix and Change of Coordinates • vectors are orthonormal to each other • Orthonormal matrix: • An orthonormal matrix can be interpreted as change-of-coordinate transformation • The rows of the matrix Q are the new basis vectors

Linear Transformation • Linear transformation • Every linear transformation can be represented as a matrix • By selecting appropriate basis vectors • Matrix form of a linear transformation can be obtained simply by learning how the basis vectors transform • Verify with 45 degree rotation. • What transformations are possible for linear transformation?

Linear Transformation that We Know • Rotation • Stretching • Anything else? • Claim: Any linear transformation is a stretching followed by a rotation • “Meaning” of singular value decomposition • An important result of linear algebra • Let us learn why this is the case

Rotation • Matrix form of rotation? What property will it have? Remember • Rotation matrix R <=> Orthonormal matrix • ’s are unit basis vectors as well • Orthonormal matrix • Change of coordinates • Rotation

Stretching (1) • Q: Matrix form of stretching by 3 along x, y, z axes in 3D? • Q: Matrix form of stretching by 3 along x axis and by 2 along y axis in 3D. • Q: Stretching matrix <=> diagonal matrix?

Stretching (2) • Q: Matrix form of stretching by 3 along and by 2 along ? • Verify by transforming (1,1) and (-1, 1) • Decomposition of T = Q T’ QT shows the transformation in a different coordinate system • Under the matrix form, the simplicity of the stretching transformation may not be obvious • Q: What if we chose as the basis?

Stretching (3) • Under a good choice of basis vectors, orthogonal-stretching transformation can always be represented as a diagonal matrix • Q: How can we tell whether a matrix corresponds to an orthogonal-stretching transformation?

Stretching – Orthogonal Stretching (1) • Remember that this is orthogonal-stretching along • If a transformation is orthogonal stretching, we should always be able to represent it as QDQT for some Q, where Q shows the stretching axes • Q: What is the matrix form of the transformation that stretches by 5 along (4/5, 3/5) and by 4 along (-3/5, 4/5)?

Stretching – Orthogonal Stretching (2) • Q: Given a matrix, how do we know whether it is orthogonal-stretching? • A: When it can be decomposed to T = QDQT • A: Spectral Theorem • Any symmetric matrix T can always be decomposed into T = QDQT • Symmetric matrix <=> orthogonal stretching • Q: How can we decompose T to QDQT? • A: If T stretches along X, then TX = X for some . • X: eigenvector of T •  : eigenvalue of T • Solve the equation for  and X

Eigen Values, Eigen Vectors and Orthogonal Stretching • Eigenvector: stretching axis • Eigenvalue: stretching factor • All eigenvectors are orthogonal<=> Orthogonal stretching<=> Symmetric matrix (spectral theorem) • Example • Q: What transformation is this?

Singular Value Decomposition (SVD) • Any linear transformation T can be decomposed toT = R S (R: rotation, S: orthogonal stretching) • One of the basic results of linear algebra • In matrix form, any matrix T can be decomposed to • Diagonal entries in D: singular values • Example Q: What transformation is this?

Singular Value Decomposition (2) • Q: For (n x m) matrix T, what will be the dimension of the three matrices after SVD? • Q: What is the meaning of non-square diagonal matrix? • The diagonal matrix is also responsible for projection (or dimension padding).

Singular Values vs Eigenvalues • Q: What is this transformation? • A: Q1 – eigenvectors of TTT D – square root of eigenvalues of TTT.Similarly, Q2 – eigenvectors of TTT D – square root of eigenvalues of TTT. • SVD can be done by computing eigenvalues and eigenvectors of TTT and TTT

SVD as Matrix Approximation • Q: If we want to reduce the rank of T to 2, what will be a good choice? • The best rank-k approximation of any matrix T is to keep the first-k entries of its SVD.

SVD Approximation Example:1000 x 1000 matrix with (0…255)

Image of original matrix 1000x1000

SVD. Rank 1 approximation

Original vs Rank 100 approximation Q: How many numbers do we keep for each?

Back to LSI topic term • LSI: decompose (doc-term) matrix to two matrices of rank-K • Our goal is to find the “best” rank-K approximation • Apply SVD, keep the top-K singular values, meaning that we keep the first K column and the first K rows of the first and third matrix after SVD. term topic X = doc doc

LSI and SVD • LSI term term topic topic = X doc doc • SVD term = doc

LSI and SVD • LSI summary • Formulate the topic-based indexing problem as rank-K matrix approximation problem • Use SVD to find the best rank-K approximation • When applied to real data, 10-20% improvement reported • Using LSI was the road to fame for Excite in early days

Limitations of LSI • Q: Any problems with LSI? • Problems with LSI • Scalability • SVD is known to be difficult to perform for a large data • Interpretability • Extracted document-topic matrix is impossible to interpret • Difficult to understand why we get good/bad results from LSI for some queries • Q: Any way to develop more interpretable topic-based indexing? • Topic for next lecture

Summary • Topic-based indexing • Synonym and polyseme problem • Index documents by topic, not by terms • Latent Semantic Index (LSI) • Document is a linear combination of its topic vector and the topic-term vectors • Formulate the problem as a rank-K matrix approximation problem • Uses SVD to find the best approximation • Basic linear algebra • Linear transformation, matrix, stretching and rotation • Orthogonal stretching, diagonal matrix, symmetric matrix, eigenvalues and eigenvectors • Rotation, change of coordinate, and orthonormal matrix • SVD and its implication as a linear transformation

CS246

CS246

Presentation Transcript

CS246 TA Session: Hadoop Tutorial

CS246 TA Session: Hadoop Tutorial

CS246

CS246

CS246

CS246

CS246

CS246

CS246: Web Information Systems

CS246: Midterm Review

CS246: Page Selection

CS246 Data & File Structures Secondary Memory

CS246 Data & File Structures Lecture 1 Introduction to File Systems

CS246

CS246

Presentation Transcript

CS246 TA Session: Hadoop Tutorial

CS246 TA Session: Hadoop Tutorial

CS246

CS246

CS246

CS246

CS246

CS246

CS246: Web Information Systems

CS246: Midterm Review

CS246: Page Selection

CS246 Data &amp; File Structures Secondary Memory

CS246 Data &amp; File Structures Lecture 1 Introduction to File Systems

CS246 Data & File Structures Secondary Memory

CS246 Data & File Structures Lecture 1 Introduction to File Systems