slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT PowerPoint Presentation
Download Presentation
Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT

Loading in 2 Seconds...

play fullscreen
1 / 33

Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT - PowerPoint PPT Presentation


  • 459 Views
  • Uploaded on

Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources http://cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT. M.A. Girolami. Contents. Latent Semantic Analysis Motivation Singular Value Decomposition Term Document Matrix Structure Query and Document Similarity in Latent Space

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Latent Semantic AnalysisA Gentle Tutorial IntroductionTutorial Resourceshttp://cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT

M.A. Girolami

University of Glasgow DCS Tutorial

contents
Contents
  • Latent Semantic Analysis
    • Motivation
    • Singular Value Decomposition
    • Term Document Matrix Structure
    • Query and Document Similarity in Latent Space
  • Probabilistic Views on LSA
    • Factor Analytic Model
    • Generative Model Representation
    • Alternate Basis to the Principal Directions
  • Latent Semantic & Document Clustering (In the Bar later)
    • Principal Direction Clustering
    • Hierarchic Clustering with LSA

University of Glasgow DCS Tutorial

latent semantic analysis
Latent Semantic Analysis
  • Motivation
    • Lexical matching at term level inaccurate (claimed)
    • Polysemy – words with number of ‘meanings’ – term matching returns irrelevant documents – impacts precision
    • Synonomy – number of words with same ‘meaning’ – term matching misses relevant documents – impacts recall
  • LSA assumes that there exists a LATENT structure in word usage – obscured by variability in word choice
  • Analogous to signal + additive noise model in signal processing

University of Glasgow DCS Tutorial

latent semantic analysis4
Latent Semantic Analysis
  • Word usage defined by term and document co-occurrence – matrix structure
  • Latent structure / semantics in word usage
  • Clustering documents or words – no shared space
  • Two mode factor analysis – dyadic decomposition into ‘latent semantic’ factor space - employing - Singular Value Decomposition
  • Cubic Computational Scaling – reasonable !

University of Glasgow DCS Tutorial

singular value decomposition
Singular Value Decomposition
  • M× N, Term × Document matrix (M >> N)

D = [d1, d2, …, dN] and d= [t1, t2, …, tM]T

Consider linear combination of terms

u1t1+ u2t2+ … + uMtM = uTd

which maximises

E{(uTd)2} = E{uTddTu} = uT E{ddT}u ≈ uTDDTu

Subject touTu = 1

University of Glasgow DCS Tutorial

singular value decomposition6
Singular Value Decomposition

Maximise uTDDTu s.tuTu = 1

Construct Langrangian uTDDTu–λuTu

Vector of partial derivatives set to zero

DDTu –λu =(DDT –λI) u = 0

As u ≠ 0 then DDT –λI must be singular i.e

|DDT –λI|= 0

This is a polynomial in λ of degree M with characteristic roots – called the eigenvalues

(German eigen = own, unique to, particular to)

University of Glasgow DCS Tutorial

singular value decomposition7
Singular Value Decomposition

The first root is called the prinicipal eigenvalue which has an associated orthonormal (uTu = 1) eigenvectoru

Subsequent roots are ordered such that λ1> λ2 >… > λM with rank(D) non-zero values.

Eigenvectors form an orthonormal basis i.e. uiTuj = δij

The eigenvalue decomposition of DDT = UΣUT

whereU = [u1, u2, …, uM] and Σ= diag[λ1, λ2, …, λM]

Similarly the eigenvalue decomposition ofDTD = VΣVT

The SVD is closely related to the above D=U Σ1/2 VT

The left eigenvectors U, right eigenvectors V,

singular values = square root of eigenvalues.

University of Glasgow DCS Tutorial

svd properties
SVD Properties
  • D=U S VT= ∑i=1..NσiuiviT and DK=∑i=1..KσiuiviT = UK SK VKTandK<N : UK TUK = IK = VK TVK
  • ThenDKis best rank K approximation to D,inF norm sense
  • K-dim orthonormal projections S-1K UK TD=VKTpreserve the maximum amount of variability
  • Under the assumption that columns of D are multivariate Gaussian then V defines principal axes of ellipse of constant varianceλi in original space

University of Glasgow DCS Tutorial

svd example

D -- 10 x 2

U -- 10 x 2

S -- 2 x 2

V T -- 2 x 2

2.9002 3.6790

4.0860 5.2366

1.9954 3.3687

3.5069 1.6748

4.4620 2.7684

-2.9444 -4.6447

-4.1132 -4.7043

-3.6208 -5.0181

-3.0558 -4.1821

-6.1204 -2.4790

-0.2750 -0.1242

-0.3896 -0.1846

-0.2247 -0.2369

-0.2150 0.3514

-0.3005 0.3318

0.3177 0.2906

0.3682 0.0833

0.3613 0.2319

0.3027 0.1861

0.3563 -0.6935

-0.6960 -0.7181

0.7181 -0.6960

16.9491 0

0 3.8491

SVD Example

University of Glasgow DCS Tutorial

svd properties10
SVD Properties
  • There is an implicit assumption that the observed data distribution is multivariate Gaussian
  • Can consider as a probabilistic generative model – latent variables are Gaussian – sub-optimal in likelihood terms for non-Gaussian distribution
  • Employed in signal processing for noise filtering – dominant subspace contains majority of information bearing part of signal
  • Similar rationale when applying SVD to LSI

University of Glasgow DCS Tutorial

computing svd
Computing SVD
  • Power Method one numerical approach

Random initialisation of vector u0

Set u1u = DDTu0 and u1 = u1u / √ (u1u)T u1u

then u2u = DDTu1 and u2 = u2u / √ (u2u)T u2u

Then uiu = DDTui-1 and ui = uiu / √ (uiu)T uiu

As i  ∞, ui u1, √ (uiu)T uiuλ1

  • Subsequent EV’s use deflation

u1u = (DDT - λ1u1u1T)u0

  • Note for term document matrix computation of u1

Inexpensive – subsequent ev’s require matrix-vector operations on dense matrix.

University of Glasgow DCS Tutorial

term document matrix structure
Term Document Matrix Structure
  • Create artificially heterogeneous collection
  • 100 documents from 3 distinct newsgroups
  • Indexed using standard stop word list
  • 12418 distinct terms
  • Term × Document Matrix (12418 × 300)
  • 8% fill of sparse matrix
  • Sort terms by rank – structure apparent
  • Matrix of cosine similarity between documents
  • Clear structure apparent

University of Glasgow DCS Tutorial

term document matrix structure13
Term Document Matrix Structure

University of Glasgow DCS Tutorial

slide14

Query and Document Similarity in Latent Space

  • Rank 3 D3 = σ1u1v1T+ σ2u2 v2T+ σ3u3 v3T
  • Projection into 3-d Latent Semantic Space
  • of all documents achieved by S3-1U3TD
  • A query q in theLSA space S3-1U3Tq
  • Similarity in LSA space
        • (S3-1U3Tq)T S3-1U3TD
        • = qTU3S3-1S3-1U3TD
        • = qTU3∑3-1U3TD
        • = qT expD =qT Θ D
  • LSA similarity metric Θ – term expansion

University of Glasgow DCS Tutorial

slide15

Query and Document Similarity in Latent Space

  • Project documents into 3-D latent space
  • Project query

University of Glasgow DCS Tutorial

random projections
Random Projections
  • Important theoretical result
    • Random projection from M - dim to L - dim space
    • Where L << M then
    • Euclidean distance and angles (norms and inner products) are preserved with high probability
    • LSA can then be performed using SVD on the reduced dimensional L × N matrix (less costly)

University of Glasgow DCS Tutorial

lsa performance
LSA Performance
  • LSA consistently improves recall on standard test collections (precision/recall generally improved)
  • Variable performance on larger TREC collections
  • Dimensionality of Latent Space – a magic number – 300 – 1000 seems to work fine – no satisfactory way of assessing value.
  • Computational cost – at present – prohibitive

University of Glasgow DCS Tutorial

probabilistic views on lsa
Probabilistic Views on LSA
  • Factor Analytic Model
  • Generative Model Representation
  • Alternate Basis to the Principal Directions

University of Glasgow DCS Tutorial

factor analytic model
Factor Analytic Model
  • d = Af + n
  • p(d) = ∑f p(d|f)p(f)
  • This probabilistic representation underlies LSA where prior and likelihood are both multivariate Gaussian.

University of Glasgow DCS Tutorial

generative model representation
Generative ModelRepresentation
  • Generate a document d with probability p(d)
  • Having observed d generate a semantic factor with probability p(f|d)
  • Having observed a semantic factor generate a word with probability p(w|f)

University of Glasgow DCS Tutorial

generative model representation22

P(d)

Factor 3

Factor 2

Documents

P(w|f)

P(f|d)

Factor 1

Generative ModelRepresentation

The cat sat on the mat and the quick brown fox jumped…

spider

University of Glasgow DCS Tutorial

generative model representation23
Generative ModelRepresentation
  • Model representation as joint probability

p(d,w) = p(d)p(w|d)

= p(d)∑f p(w|f)p(f|d)

w and d conditionally independent given f

  • p(d,w) = ∑f p(w|f)p(f)p(d|f)
  • Note similarity with DK=∑i=1..KσiuiviT

University of Glasgow DCS Tutorial

slide24

P(w=spider|f4)=0.6

P(w=spider|f4)=0.02

P(w=spider|f4)=0.01

P(w=spider|f4)=0.1

p(d,w) = p(d)∑f p(w|f)p(f|d) = 0.001

p(f=4|d)=0.05

p(f=1|d)=0.6

p(f=2|d)=0.1

p(f=3|d)=0.25

The cat sat on the mat and the quick brown fox jumped…

Documents

P(d) = 0.003

University of Glasgow DCS Tutorial

generative model representation25
Generative ModelRepresentation
  • Distributions of p(f|d) and p(w|f) are multinomial – counts in successive trials
  • More appropriate than Gaussian
  • Note that Term × Document matrix is a sample from the true distribution pt(d, w)
  • ∑ijD(i,j) log p(dj, wi) – cross-entropy between model and realisation – maximise likelihood that the model p(dj, wi) generated the realisation D – subject to conditions on p(f|d) and p(w|f)

University of Glasgow DCS Tutorial

generative model representation26
Generative ModelRepresentation
  • Estimation of p(f|d) and p(w|f) requires use of a standard EM algorithm.
  • Expectation Maximisation
    • General iterative method for ML parameter estimation
    • Ideal for ‘missing variable’ problems
  • Estimate p(f|d,w) using current estimates of p(w|f) and p(f|d)
  • Estimate new values of p(w|f) and p(f|d) using current estimate of p(f|d,w)

University of Glasgow DCS Tutorial

generative model representation27
Generative ModelRepresentation
  • Once parameters estimated
    • p(f|d) gives posterior probability that Semantic factor ‘f’ is associated with d
    • p(w|f) gives the probability of word ‘w’ being generated from Semantic factor ‘f’
  • Nice clear interpretation unlike U and V terms in SVD
  • ‘Sparse’ representation – unlike SVD

University of Glasgow DCS Tutorial

generative model representation28
Generative ModelRepresentation
  • Take the toy collection generated – estimate p(f|d) and p(w|f)
  • Graphical representation of p(f|d)

University of Glasgow DCS Tutorial

generative model representation29
Generative ModelRepresentation
  • Ordered representation of p(w|f)

University of Glasgow DCS Tutorial

alternate basis to the principal directions
Alternate Basis to the Principal Directions
  • Similarity between query and documents can be assessed in ‘factor’ space – vis. LSA
  • Sim = ∑f p(f|q) p(f|D) averaged product of query and doc posterior probabilities over all ‘factors’ – latent space
  • Alternately note that D and q are sample instances from an unknown distribution
  • All probabilities – word counts – estimated from D ‘noisy’
  • Employ p(dj, wi) as ‘smoothed’ version of tf and use ‘cosine’ measure ∑i p(D, wi) × qi ‘query expansion’

University of Glasgow DCS Tutorial

alternate basis to the principal directions31
Alternate Basis to the Principal Directions
  • Both forms of matching shown to improve on LSA (MED,CRAN,CACM)
  • Elegant statistically principled approach – can employ (in theory) Bayesian model assessment techniques.
  • Likelihood nonlinear function of parameters p(f|d) and p(w|f) – Huge parameter space – small number of relative samples – high bias and variance expected
  • Assessment of correlation with likelihood and P/R – yet to be studied in depth

University of Glasgow DCS Tutorial

conclusions
Conclusions
  • SVD defined basis provide P/R improvements over term matching
    • Interpretation difficult
    • Optimal dimension – open question
    • Variable performance on LARGE coll’s
    • Supercomputing muscle required
  • Probabilistic approaches provide improvements over SVD
    • Clear interpretation of decomposition
    • Optimal dimension – open question
    • High variability of results due to nonlinear optimisation over HUGE parameter space
  • Improvements marginal in relation to cost

University of Glasgow DCS Tutorial

latent semantic hierarchic document clustering
Latent Semantic & Hierarchic Document Clustering
  • Had enough ? ….
  • ….. To the Bar…

University of Glasgow DCS Tutorial