1 / 33

Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT

Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources http://cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT. M.A. Girolami. Contents. Latent Semantic Analysis Motivation Singular Value Decomposition Term Document Matrix Structure Query and Document Similarity in Latent Space

andres
Download Presentation

Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Latent Semantic AnalysisA Gentle Tutorial IntroductionTutorial Resourceshttp://cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT M.A. Girolami University of Glasgow DCS Tutorial

  2. Contents • Latent Semantic Analysis • Motivation • Singular Value Decomposition • Term Document Matrix Structure • Query and Document Similarity in Latent Space • Probabilistic Views on LSA • Factor Analytic Model • Generative Model Representation • Alternate Basis to the Principal Directions • Latent Semantic & Document Clustering (In the Bar later) • Principal Direction Clustering • Hierarchic Clustering with LSA University of Glasgow DCS Tutorial

  3. Latent Semantic Analysis • Motivation • Lexical matching at term level inaccurate (claimed) • Polysemy – words with number of ‘meanings’ – term matching returns irrelevant documents – impacts precision • Synonomy – number of words with same ‘meaning’ – term matching misses relevant documents – impacts recall • LSA assumes that there exists a LATENT structure in word usage – obscured by variability in word choice • Analogous to signal + additive noise model in signal processing University of Glasgow DCS Tutorial

  4. Latent Semantic Analysis • Word usage defined by term and document co-occurrence – matrix structure • Latent structure / semantics in word usage • Clustering documents or words – no shared space • Two mode factor analysis – dyadic decomposition into ‘latent semantic’ factor space - employing - Singular Value Decomposition • Cubic Computational Scaling – reasonable ! University of Glasgow DCS Tutorial

  5. Singular Value Decomposition • M× N, Term × Document matrix (M >> N) D = [d1, d2, …, dN] and d= [t1, t2, …, tM]T Consider linear combination of terms u1t1+ u2t2+ … + uMtM = uTd which maximises E{(uTd)2} = E{uTddTu} = uT E{ddT}u ≈ uTDDTu Subject touTu = 1 University of Glasgow DCS Tutorial

  6. Singular Value Decomposition Maximise uTDDTu s.tuTu = 1 Construct Langrangian uTDDTu–λuTu Vector of partial derivatives set to zero DDTu –λu =(DDT –λI) u = 0 As u ≠ 0 then DDT –λI must be singular i.e |DDT –λI|= 0 This is a polynomial in λ of degree M with characteristic roots – called the eigenvalues (German eigen = own, unique to, particular to) University of Glasgow DCS Tutorial

  7. Singular Value Decomposition The first root is called the prinicipal eigenvalue which has an associated orthonormal (uTu = 1) eigenvectoru Subsequent roots are ordered such that λ1> λ2 >… > λM with rank(D) non-zero values. Eigenvectors form an orthonormal basis i.e. uiTuj = δij The eigenvalue decomposition of DDT = UΣUT whereU = [u1, u2, …, uM] and Σ= diag[λ1, λ2, …, λM] Similarly the eigenvalue decomposition ofDTD = VΣVT The SVD is closely related to the above D=U Σ1/2 VT The left eigenvectors U, right eigenvectors V, singular values = square root of eigenvalues. University of Glasgow DCS Tutorial

  8. SVD Properties • D=U S VT= ∑i=1..NσiuiviT and DK=∑i=1..KσiuiviT = UK SK VKTandK<N : UK TUK = IK = VK TVK • ThenDKis best rank K approximation to D,inF norm sense • K-dim orthonormal projections S-1K UK TD=VKTpreserve the maximum amount of variability • Under the assumption that columns of D are multivariate Gaussian then V defines principal axes of ellipse of constant varianceλi in original space University of Glasgow DCS Tutorial

  9. D -- 10 x 2 U -- 10 x 2 S -- 2 x 2 V T -- 2 x 2 2.9002 3.6790 4.0860 5.2366 1.9954 3.3687 3.5069 1.6748 4.4620 2.7684 -2.9444 -4.6447 -4.1132 -4.7043 -3.6208 -5.0181 -3.0558 -4.1821 -6.1204 -2.4790 -0.2750 -0.1242 -0.3896 -0.1846 -0.2247 -0.2369 -0.2150 0.3514 -0.3005 0.3318 0.3177 0.2906 0.3682 0.0833 0.3613 0.2319 0.3027 0.1861 0.3563 -0.6935 -0.6960 -0.7181 0.7181 -0.6960 16.9491 0 0 3.8491 SVD Example University of Glasgow DCS Tutorial

  10. SVD Properties • There is an implicit assumption that the observed data distribution is multivariate Gaussian • Can consider as a probabilistic generative model – latent variables are Gaussian – sub-optimal in likelihood terms for non-Gaussian distribution • Employed in signal processing for noise filtering – dominant subspace contains majority of information bearing part of signal • Similar rationale when applying SVD to LSI University of Glasgow DCS Tutorial

  11. Computing SVD • Power Method one numerical approach Random initialisation of vector u0 Set u1u = DDTu0 and u1 = u1u / √ (u1u)T u1u then u2u = DDTu1 and u2 = u2u / √ (u2u)T u2u Then uiu = DDTui-1 and ui = uiu / √ (uiu)T uiu As i  ∞, ui u1, √ (uiu)T uiuλ1 • Subsequent EV’s use deflation u1u = (DDT - λ1u1u1T)u0 • Note for term document matrix computation of u1 Inexpensive – subsequent ev’s require matrix-vector operations on dense matrix. University of Glasgow DCS Tutorial

  12. Term Document Matrix Structure • Create artificially heterogeneous collection • 100 documents from 3 distinct newsgroups • Indexed using standard stop word list • 12418 distinct terms • Term × Document Matrix (12418 × 300) • 8% fill of sparse matrix • Sort terms by rank – structure apparent • Matrix of cosine similarity between documents • Clear structure apparent University of Glasgow DCS Tutorial

  13. Term Document Matrix Structure University of Glasgow DCS Tutorial

  14. Query and Document Similarity in Latent Space • Rank 3 D3 = σ1u1v1T+ σ2u2 v2T+ σ3u3 v3T • Projection into 3-d Latent Semantic Space • of all documents achieved by S3-1U3TD • A query q in theLSA space S3-1U3Tq • Similarity in LSA space • (S3-1U3Tq)T S3-1U3TD • = qTU3S3-1S3-1U3TD • = qTU3∑3-1U3TD • = qT expD =qT Θ D • LSA similarity metric Θ – term expansion University of Glasgow DCS Tutorial

  15. Query and Document Similarity in Latent Space • Project documents into 3-D latent space • Project query University of Glasgow DCS Tutorial

  16. Random Projections • Important theoretical result • Random projection from M - dim to L - dim space • Where L << M then • Euclidean distance and angles (norms and inner products) are preserved with high probability • LSA can then be performed using SVD on the reduced dimensional L × N matrix (less costly) University of Glasgow DCS Tutorial

  17. University of Glasgow DCS Tutorial

  18. LSA Performance • LSA consistently improves recall on standard test collections (precision/recall generally improved) • Variable performance on larger TREC collections • Dimensionality of Latent Space – a magic number – 300 – 1000 seems to work fine – no satisfactory way of assessing value. • Computational cost – at present – prohibitive University of Glasgow DCS Tutorial

  19. Probabilistic Views on LSA • Factor Analytic Model • Generative Model Representation • Alternate Basis to the Principal Directions University of Glasgow DCS Tutorial

  20. Factor Analytic Model • d = Af + n • p(d) = ∑f p(d|f)p(f) • This probabilistic representation underlies LSA where prior and likelihood are both multivariate Gaussian. University of Glasgow DCS Tutorial

  21. Generative ModelRepresentation • Generate a document d with probability p(d) • Having observed d generate a semantic factor with probability p(f|d) • Having observed a semantic factor generate a word with probability p(w|f) University of Glasgow DCS Tutorial

  22. P(d) Factor 3 Factor 2 Documents P(w|f) P(f|d) Factor 1 Generative ModelRepresentation The cat sat on the mat and the quick brown fox jumped… spider University of Glasgow DCS Tutorial

  23. Generative ModelRepresentation • Model representation as joint probability p(d,w) = p(d)p(w|d) = p(d)∑f p(w|f)p(f|d) w and d conditionally independent given f • p(d,w) = ∑f p(w|f)p(f)p(d|f) • Note similarity with DK=∑i=1..KσiuiviT University of Glasgow DCS Tutorial

  24. P(w=spider|f4)=0.6 P(w=spider|f4)=0.02 P(w=spider|f4)=0.01 P(w=spider|f4)=0.1 p(d,w) = p(d)∑f p(w|f)p(f|d) = 0.001 p(f=4|d)=0.05 p(f=1|d)=0.6 p(f=2|d)=0.1 p(f=3|d)=0.25 The cat sat on the mat and the quick brown fox jumped… Documents P(d) = 0.003 University of Glasgow DCS Tutorial

  25. Generative ModelRepresentation • Distributions of p(f|d) and p(w|f) are multinomial – counts in successive trials • More appropriate than Gaussian • Note that Term × Document matrix is a sample from the true distribution pt(d, w) • ∑ijD(i,j) log p(dj, wi) – cross-entropy between model and realisation – maximise likelihood that the model p(dj, wi) generated the realisation D – subject to conditions on p(f|d) and p(w|f) University of Glasgow DCS Tutorial

  26. Generative ModelRepresentation • Estimation of p(f|d) and p(w|f) requires use of a standard EM algorithm. • Expectation Maximisation • General iterative method for ML parameter estimation • Ideal for ‘missing variable’ problems • Estimate p(f|d,w) using current estimates of p(w|f) and p(f|d) • Estimate new values of p(w|f) and p(f|d) using current estimate of p(f|d,w) University of Glasgow DCS Tutorial

  27. Generative ModelRepresentation • Once parameters estimated • p(f|d) gives posterior probability that Semantic factor ‘f’ is associated with d • p(w|f) gives the probability of word ‘w’ being generated from Semantic factor ‘f’ • Nice clear interpretation unlike U and V terms in SVD • ‘Sparse’ representation – unlike SVD University of Glasgow DCS Tutorial

  28. Generative ModelRepresentation • Take the toy collection generated – estimate p(f|d) and p(w|f) • Graphical representation of p(f|d) University of Glasgow DCS Tutorial

  29. Generative ModelRepresentation • Ordered representation of p(w|f) University of Glasgow DCS Tutorial

  30. Alternate Basis to the Principal Directions • Similarity between query and documents can be assessed in ‘factor’ space – vis. LSA • Sim = ∑f p(f|q) p(f|D) averaged product of query and doc posterior probabilities over all ‘factors’ – latent space • Alternately note that D and q are sample instances from an unknown distribution • All probabilities – word counts – estimated from D ‘noisy’ • Employ p(dj, wi) as ‘smoothed’ version of tf and use ‘cosine’ measure ∑i p(D, wi) × qi ‘query expansion’ University of Glasgow DCS Tutorial

  31. Alternate Basis to the Principal Directions • Both forms of matching shown to improve on LSA (MED,CRAN,CACM) • Elegant statistically principled approach – can employ (in theory) Bayesian model assessment techniques. • Likelihood nonlinear function of parameters p(f|d) and p(w|f) – Huge parameter space – small number of relative samples – high bias and variance expected • Assessment of correlation with likelihood and P/R – yet to be studied in depth University of Glasgow DCS Tutorial

  32. Conclusions • SVD defined basis provide P/R improvements over term matching • Interpretation difficult • Optimal dimension – open question • Variable performance on LARGE coll’s • Supercomputing muscle required • Probabilistic approaches provide improvements over SVD • Clear interpretation of decomposition • Optimal dimension – open question • High variability of results due to nonlinear optimisation over HUGE parameter space • Improvements marginal in relation to cost University of Glasgow DCS Tutorial

  33. Latent Semantic & Hierarchic Document Clustering • Had enough ? …. • ….. To the Bar… University of Glasgow DCS Tutorial

More Related