1 / 25

Introducing Latent Semantic Analysis

Introducing Latent Semantic Analysis. Tomas K Landauer et al., “An introduction to latent semantic analysis,” Discourse Processes, Vol. 25 (2-3), pp. 259-284, 1998.

otis
Download Presentation

Introducing Latent Semantic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducing Latent Semantic Analysis Tomas K Landauer et al., “An introduction to latent semantic analysis,” Discourse Processes, Vol. 25 (2-3), pp. 259-284, 1998. Scott Deerwester et al., “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, Vol. 41 (6), pp. 391-407, 1990. Kirk Baker, “Singular Value Decomposition Tutorial,” Electronic document, 2005. Aug 22, 2014 Hee-Gook Jun

  2. Outline • SVD • SVD to LSA • Conclusion

  3. Eigendecomposition vs. Singular Value Decomposition • Eigendecomposition • Must be a diagonalizable matrix • Must be a square matrix • Matrix (n x n size) must have n linearly independent eigenvector • e.g. symmetric matrix .. • Singular Value Decomposition • Computable for any size (M x n) of matrix A A U P ∑ VT Ʌ P-1

  4. U: Left Singular Vectors of A • Unitary matrix • Columns of U are orthonormal (orthogonal + normal) • orthonormal eigenvectors of AAT A U ∑ VT and is orthogonal is normal vector = [0,0,0,1] = [0,1,0,0] = (0x0) + (0x1) + (0x0) + (1x0) = 0 = [0,0,0,1] || = = 1

  5. V: Right Singular Vectors of A • Unitary matrix • Columns of V are orthonormal (orthogonal + normal) • orthonormal eigenvectors of ATA A U ∑ VT

  6. ∑ (or S) • Diagonal Matrix • Diagonal entries are the singular values of A • Singular values • Non-zero singular values • Square roots of eigenvalues from U (or V) in descending order A U ∑ VT

  7. Calculation Procedure • U is a list of eigenvectors of AAT • Compute AAT • Compute eigenvectors of AAT • Matrix Orthonormalization • V is a list of eigenvectors of ATA • Compute ATA • Compute eigenvalues of ATA • Orthonormalize and transpose • ∑ is a list of eigenvalues of U or V • (eigenvalues of U = eigenvalues of V) A U ∑ VT ① ② ③

  8. 1.1 Matrix U – Compute AAT • Start with the matrix • Transpose of A • Then

  9. 1.2 Matrix U – Eigenvectors and Eigenvalues [1/2] • Eigenvector • Nonzero vector that satisfies the equation • A is a square matrix, is an eigenvalue (scalar), is the eigenvector rearrange ≡ set determinent of the coefficient matrix to zero

  10. 1.2 Matrix U – Eigenvectors and Eigenvalues [2/2] Calculated eigenvalues eigenvector ① For eigenvector ② For Thus, set of eigenvectors

  11. 1.3 Matrix U – Orthonormalization set of eigenvectors orthonormal matrix Gram-Schmidt orthonormalization normalize v1 find w2 (orthogonal to u1) normalize w2

  12. 2.1 Matrix VT – Compute ATA • Start with the matrix • Transpose of A • Then

  13. 2.2 Matrix VT– Eigenvectors and Eigenvalues [1/2] • Eigenvector • Nonzero vector that satisfies the equation • A is a square matrix, is an eigenvalue (scalar), is the eigenvector rearrange ≡ set determinent of the coefficient matrix to zero by cofactor expansion (여인수 전개)

  14. 2.2 Matrix VT– Eigenvectors and Eigenvalues [2/2] eigenvector ① For ② For ③ For Thus, set of eigenvectors

  15. 2.3 Matrix VT – Orthonormalization and Transformation Gram-Schmidt orthonormalization set of eigenvectors orthonormal matrix Transpose normalize v1 find w2 (orthogonal to u1) normalize w2 find w3 (orthogonal to u2) normalize w3

  16. 3.1 Matrix ∑ (= S) • Square roots of the non-zero eigenvalues • Populate the diagonal with the values • Diagonal entries in ∑ are the singular values of A

  17. Outline • SVD • SVD to LSA

  18. Latent Semantic Analysis • Use SVD (Singular Value Decomposition) • to simulate human learning of word and passage meaning • Represent word and passage meaning • as high-dimensional vectors in the semantic space

  19. LSA Example First analysis – Document Similarity Second analysis – Term Similarity doc 1 " modem the steering linux. modem, linux the modem. steering the modem. linux " doc 2 "linux; the linux. the linux modem linux. the modem, clutch the modem. petrol " doc 3 " petrol! clutch the steering, steering, linux. the steering clutch petrol. clutch the petrol; the clutch " doc 4 " the thethe. clutch clutchclutch! steering petrol; steering petrol petrol; steering petrol "

  20. LSA Example: Build a Term Frequency Matrix Let Matrix A =

  21. LSA Example: Compute SVD of Matrix A A - R code - result ← svd(A) = U S VT x x 6 x 4 4 x 4 4 x 4

  22. LSA Example: Reduced SVD 4 x 4 4 x 4 6 x 4 x x 2 x 2 2 x 4 6 x 2 x x

  23. LSA Example: Document Similarity S V = x 2 x 2 2 x 4 doc 1 "modem the steering linux. modem, linuxthe modem. steering the modem. linux " doc 2 "linux; the linux. the linux modem linux. the modem, clutch the modem. petrol " doc 3 "petrol! clutch the steering, steering, linux. the steering clutch petrol. clutch the petrol; the clutch " doc 4 "the thethe. clutch clutchclutch! steering petrol; steering petrol petrol; steering petrol "

  24. LSA Example: Term Similarity S V x = linux modem the clutch steering petrol

  25. Conclusion • Pros • Compute document similarity • even if they do not have common words • Cons • Statistical foundation missing → PLSA Which one is to be chosen to reduce? x x

More Related