Loading in 2 Seconds...

Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT

Loading in 2 Seconds...

- 459 Views
- Uploaded on

Download Presentation
## Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Latent Semantic AnalysisA Gentle Tutorial IntroductionTutorial Resourceshttp://cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT

M.A. Girolami

University of Glasgow DCS Tutorial

Contents

- Latent Semantic Analysis
- Motivation
- Singular Value Decomposition
- Term Document Matrix Structure
- Query and Document Similarity in Latent Space
- Probabilistic Views on LSA
- Factor Analytic Model
- Generative Model Representation
- Alternate Basis to the Principal Directions
- Latent Semantic & Document Clustering (In the Bar later)
- Principal Direction Clustering
- Hierarchic Clustering with LSA

University of Glasgow DCS Tutorial

Latent Semantic Analysis

- Motivation
- Lexical matching at term level inaccurate (claimed)
- Polysemy – words with number of ‘meanings’ – term matching returns irrelevant documents – impacts precision
- Synonomy – number of words with same ‘meaning’ – term matching misses relevant documents – impacts recall
- LSA assumes that there exists a LATENT structure in word usage – obscured by variability in word choice
- Analogous to signal + additive noise model in signal processing

University of Glasgow DCS Tutorial

Latent Semantic Analysis

- Word usage defined by term and document co-occurrence – matrix structure
- Latent structure / semantics in word usage
- Clustering documents or words – no shared space
- Two mode factor analysis – dyadic decomposition into ‘latent semantic’ factor space - employing - Singular Value Decomposition
- Cubic Computational Scaling – reasonable !

University of Glasgow DCS Tutorial

Singular Value Decomposition

- M× N, Term × Document matrix (M >> N)

D = [d1, d2, …, dN] and d= [t1, t2, …, tM]T

Consider linear combination of terms

u1t1+ u2t2+ … + uMtM = uTd

which maximises

E{(uTd)2} = E{uTddTu} = uT E{ddT}u ≈ uTDDTu

Subject touTu = 1

University of Glasgow DCS Tutorial

Singular Value Decomposition

Maximise uTDDTu s.tuTu = 1

Construct Langrangian uTDDTu–λuTu

Vector of partial derivatives set to zero

DDTu –λu =(DDT –λI) u = 0

As u ≠ 0 then DDT –λI must be singular i.e

|DDT –λI|= 0

This is a polynomial in λ of degree M with characteristic roots – called the eigenvalues

(German eigen = own, unique to, particular to)

University of Glasgow DCS Tutorial

Singular Value Decomposition

The first root is called the prinicipal eigenvalue which has an associated orthonormal (uTu = 1) eigenvectoru

Subsequent roots are ordered such that λ1> λ2 >… > λM with rank(D) non-zero values.

Eigenvectors form an orthonormal basis i.e. uiTuj = δij

The eigenvalue decomposition of DDT = UΣUT

whereU = [u1, u2, …, uM] and Σ= diag[λ1, λ2, …, λM]

Similarly the eigenvalue decomposition ofDTD = VΣVT

The SVD is closely related to the above D=U Σ1/2 VT

The left eigenvectors U, right eigenvectors V,

singular values = square root of eigenvalues.

University of Glasgow DCS Tutorial

SVD Properties

- D=U S VT= ∑i=1..NσiuiviT and DK=∑i=1..KσiuiviT = UK SK VKTandK<N : UK TUK = IK = VK TVK
- ThenDKis best rank K approximation to D,inF norm sense
- K-dim orthonormal projections S-1K UK TD=VKTpreserve the maximum amount of variability
- Under the assumption that columns of D are multivariate Gaussian then V defines principal axes of ellipse of constant varianceλi in original space

University of Glasgow DCS Tutorial

U -- 10 x 2

S -- 2 x 2

V T -- 2 x 2

2.9002 3.6790

4.0860 5.2366

1.9954 3.3687

3.5069 1.6748

4.4620 2.7684

-2.9444 -4.6447

-4.1132 -4.7043

-3.6208 -5.0181

-3.0558 -4.1821

-6.1204 -2.4790

-0.2750 -0.1242

-0.3896 -0.1846

-0.2247 -0.2369

-0.2150 0.3514

-0.3005 0.3318

0.3177 0.2906

0.3682 0.0833

0.3613 0.2319

0.3027 0.1861

0.3563 -0.6935

-0.6960 -0.7181

0.7181 -0.6960

16.9491 0

0 3.8491

SVD ExampleUniversity of Glasgow DCS Tutorial

SVD Properties

- There is an implicit assumption that the observed data distribution is multivariate Gaussian
- Can consider as a probabilistic generative model – latent variables are Gaussian – sub-optimal in likelihood terms for non-Gaussian distribution
- Employed in signal processing for noise filtering – dominant subspace contains majority of information bearing part of signal
- Similar rationale when applying SVD to LSI

University of Glasgow DCS Tutorial

Computing SVD

- Power Method one numerical approach

Random initialisation of vector u0

Set u1u = DDTu0 and u1 = u1u / √ (u1u)T u1u

then u2u = DDTu1 and u2 = u2u / √ (u2u)T u2u

Then uiu = DDTui-1 and ui = uiu / √ (uiu)T uiu

As i ∞, ui u1, √ (uiu)T uiuλ1

- Subsequent EV’s use deflation

u1u = (DDT - λ1u1u1T)u0

- Note for term document matrix computation of u1

Inexpensive – subsequent ev’s require matrix-vector operations on dense matrix.

University of Glasgow DCS Tutorial

Term Document Matrix Structure

- Create artificially heterogeneous collection
- 100 documents from 3 distinct newsgroups
- Indexed using standard stop word list
- 12418 distinct terms
- Term × Document Matrix (12418 × 300)
- 8% fill of sparse matrix
- Sort terms by rank – structure apparent
- Matrix of cosine similarity between documents
- Clear structure apparent

University of Glasgow DCS Tutorial

Term Document Matrix Structure

University of Glasgow DCS Tutorial

Query and Document Similarity in Latent Space

- Rank 3 D3 = σ1u1v1T+ σ2u2 v2T+ σ3u3 v3T
- Projection into 3-d Latent Semantic Space
- of all documents achieved by S3-1U3TD
- A query q in theLSA space S3-1U3Tq
- Similarity in LSA space
- (S3-1U3Tq)T S3-1U3TD
- = qTU3S3-1S3-1U3TD
- = qTU3∑3-1U3TD
- = qT expD =qT Θ D
- LSA similarity metric Θ – term expansion

University of Glasgow DCS Tutorial

Query and Document Similarity in Latent Space

- Project documents into 3-D latent space
- Project query

University of Glasgow DCS Tutorial

Random Projections

- Important theoretical result
- Random projection from M - dim to L - dim space
- Where L << M then
- Euclidean distance and angles (norms and inner products) are preserved with high probability
- LSA can then be performed using SVD on the reduced dimensional L × N matrix (less costly)

University of Glasgow DCS Tutorial

LSA Performance

- LSA consistently improves recall on standard test collections (precision/recall generally improved)
- Variable performance on larger TREC collections
- Dimensionality of Latent Space – a magic number – 300 – 1000 seems to work fine – no satisfactory way of assessing value.
- Computational cost – at present – prohibitive

University of Glasgow DCS Tutorial

Probabilistic Views on LSA

- Factor Analytic Model
- Generative Model Representation
- Alternate Basis to the Principal Directions

University of Glasgow DCS Tutorial

Factor Analytic Model

- d = Af + n
- p(d) = ∑f p(d|f)p(f)
- This probabilistic representation underlies LSA where prior and likelihood are both multivariate Gaussian.

University of Glasgow DCS Tutorial

Generative ModelRepresentation

- Generate a document d with probability p(d)
- Having observed d generate a semantic factor with probability p(f|d)
- Having observed a semantic factor generate a word with probability p(w|f)

University of Glasgow DCS Tutorial

Factor 3

Factor 2

Documents

P(w|f)

P(f|d)

Factor 1

Generative ModelRepresentationThe cat sat on the mat and the quick brown fox jumped…

spider

University of Glasgow DCS Tutorial

Generative ModelRepresentation

- Model representation as joint probability

p(d,w) = p(d)p(w|d)

= p(d)∑f p(w|f)p(f|d)

w and d conditionally independent given f

- p(d,w) = ∑f p(w|f)p(f)p(d|f)
- Note similarity with DK=∑i=1..KσiuiviT

University of Glasgow DCS Tutorial

P(w=spider|f4)=0.02

P(w=spider|f4)=0.01

P(w=spider|f4)=0.1

p(d,w) = p(d)∑f p(w|f)p(f|d) = 0.001

p(f=4|d)=0.05

p(f=1|d)=0.6

p(f=2|d)=0.1

p(f=3|d)=0.25

The cat sat on the mat and the quick brown fox jumped…

Documents

P(d) = 0.003

University of Glasgow DCS Tutorial

Generative ModelRepresentation

- Distributions of p(f|d) and p(w|f) are multinomial – counts in successive trials
- More appropriate than Gaussian
- Note that Term × Document matrix is a sample from the true distribution pt(d, w)
- ∑ijD(i,j) log p(dj, wi) – cross-entropy between model and realisation – maximise likelihood that the model p(dj, wi) generated the realisation D – subject to conditions on p(f|d) and p(w|f)

University of Glasgow DCS Tutorial

Generative ModelRepresentation

- Estimation of p(f|d) and p(w|f) requires use of a standard EM algorithm.
- Expectation Maximisation
- General iterative method for ML parameter estimation
- Ideal for ‘missing variable’ problems
- Estimate p(f|d,w) using current estimates of p(w|f) and p(f|d)
- Estimate new values of p(w|f) and p(f|d) using current estimate of p(f|d,w)

University of Glasgow DCS Tutorial

Generative ModelRepresentation

- Once parameters estimated
- p(f|d) gives posterior probability that Semantic factor ‘f’ is associated with d
- p(w|f) gives the probability of word ‘w’ being generated from Semantic factor ‘f’
- Nice clear interpretation unlike U and V terms in SVD
- ‘Sparse’ representation – unlike SVD

University of Glasgow DCS Tutorial

Generative ModelRepresentation

- Take the toy collection generated – estimate p(f|d) and p(w|f)
- Graphical representation of p(f|d)

University of Glasgow DCS Tutorial

Alternate Basis to the Principal Directions

- Similarity between query and documents can be assessed in ‘factor’ space – vis. LSA
- Sim = ∑f p(f|q) p(f|D) averaged product of query and doc posterior probabilities over all ‘factors’ – latent space
- Alternately note that D and q are sample instances from an unknown distribution
- All probabilities – word counts – estimated from D ‘noisy’
- Employ p(dj, wi) as ‘smoothed’ version of tf and use ‘cosine’ measure ∑i p(D, wi) × qi ‘query expansion’

University of Glasgow DCS Tutorial

Alternate Basis to the Principal Directions

- Both forms of matching shown to improve on LSA (MED,CRAN,CACM)
- Elegant statistically principled approach – can employ (in theory) Bayesian model assessment techniques.
- Likelihood nonlinear function of parameters p(f|d) and p(w|f) – Huge parameter space – small number of relative samples – high bias and variance expected
- Assessment of correlation with likelihood and P/R – yet to be studied in depth

University of Glasgow DCS Tutorial

Conclusions

- SVD defined basis provide P/R improvements over term matching
- Interpretation difficult
- Optimal dimension – open question
- Variable performance on LARGE coll’s
- Supercomputing muscle required
- Probabilistic approaches provide improvements over SVD
- Clear interpretation of decomposition
- Optimal dimension – open question
- High variability of results due to nonlinear optimisation over HUGE parameter space
- Improvements marginal in relation to cost

University of Glasgow DCS Tutorial

Latent Semantic & Hierarchic Document Clustering

- Had enough ? ….
- ….. To the Bar…

University of Glasgow DCS Tutorial

Download Presentation

Connecting to Server..