Loading in 5 sec....

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic ModelsPowerPoint Presentation

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

- By
**paul2** - Follow User

- 312 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Sparse Word Graphs:' - paul2

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Sparse Word Graphs:A Scalable Algorithm for Capturing Word Correlations in Topic Models

Ramesh NallapatiJoint work with

John Lafferty, Amr Ahmed,

William Cohen and Eric Xing

Machine Learning Department

Carnegie Mellon University

Introduction

- Statistical topic modeling: an attractive framework for topic discovery
- Completely unsupervised
- Models text very well
- Lower perplexity compared to unigram models

- Reveals meaningful semantic patterns
- Can help summarize and visualize document collections
- e.g.: PLSA, LDA, DPM, DTM, CTM, PA

ICDM’07 HPDM workskop

Introduction

- A common assumption in all the variants:
- Exchangeability: “bag of words” assumption
- Topics represented as a ranked list of words

- Consequences:
- Word Correlation information is lost
- e.g.: “white-house” vs. “white” and “house”
- Long distance correlations

- Word Correlation information is lost

ICDM’07 HPDM workskop

Introduction

- Objective:
- To capture correlations between words within topics

- Motivation:
- More interpretable representation of topics as a network of words rather than a list
- Helps better visualize and summarize document collections
- May reveal unexpected relationships and patterns within topics

ICDM’07 HPDM workskop

Past Work: Topic Models

- Bigram topic models[Wallach, ICML 2006]

- Requires KV(K-1) parameters
- Only captures local dependencies
- Does not model sparsity of correlations
- Does not capture “within-topic” correlations

ICDM’07 HPDM workskop

Past work: Other approaches

- Hyperspace Analog to Language (HAL)
[Lund and Burges, Cog. Sci., ‘96]

- Word pair correlation measured as a weighted count of number of times they occur within a fixed length window
- Weight of an occurrence / 1/(mutual distance)

ICDM’07 HPDM workskop

Past work: Other approaches

- Hyperspace Analog to Language (HAL)
[Lund and Burges, Cog. Sci., ‘96]

- Plusses:
- Sparse solutions, scalability

- Minuses:
- Only unearths global correlations, not semantic correlations
- E.g.: “river – bank”, “bank – check”

- Only local dependencies

- Only unearths global correlations, not semantic correlations

- Plusses:

ICDM’07 HPDM workskop

Past work: Other approaches

- Query expansion in IR
- Similar in spirit: finds words that highly co-occur with the query words
- However, not a corpus visualization tool: requires a context to operate on

- Wordnet
- Semantic networks
- Human labeled: not directly related to our goal

ICDM’07 HPDM workskop

Our approach

- L1 norm regularization
- Known to enforce sparse solutions
- Sparsity permits scalability

- Convex optimization problem
- Globally optimal solutions

- Recent advances in learning structure of graphical models:
- L1 regularization framework asymptotically leads to true structure

- Known to enforce sparse solutions

ICDM’07 HPDM workskop

Background:LASSO

- Example: linear regression
- Regularization used to improve generalizability
- E.g.1: Ridge regression: L2 norm regularization
- E.g.2: Lasso: L1 norm regularization

ICDM’07 HPDM workskop

Background: Gaussian Random Fields

- Multivariate Gaussian distribution
- Random field structure: G = (V,E)
- V: set of all variables {X1,,Xp}
- (s,t) 2 E ,-1st 0
- Xs? Xu | XN(s) where u N(s)

ICDM’07 HPDM workskop

Background: Gaussian Random Fields

- Estimating the graph structure of GRF from data [Meinshausen and Buhlmann, Annals. Stats., 2006]
- Regress each variable onto others imposing L1 penalty to encourage sparsity
- Estimated neighborhood:

ICDM’07 HPDM workskop

Background: Gaussian Random Fields

Estimated graph

True Graph

Courtesy: [Meinshausen and Buhlmann, Annals. Stats., 2006]

ICDM’07 HPDM workskop

Background: Gaussian Random Fields

- Application to topic models: CTM
[Blei and Lafferty, NIPS, 2006]

ICDM’07 HPDM workskop

Background: Gaussian Random Fields

- Application to CTM:[Blei & Lafferty, Annals. Appl. Stats., ‘07]

ICDM’07 HPDM workskop

Structure learning of an MRF

- Ising model
- L1 regularized conditional likelihood learns true structure asymptotically
[Wainwright, Ravikumar and Lafferty, NIPS’06]

ICDM’07 HPDM workskop

Structure learning of an MRF

Courtesy: [Wainwright, Ravikumar and Lafferty, NIPS’06]

ICDM’07 HPDM workskop

Sparse Word Graphs

- Algorithm
- Run LDA on the document collection and obtain topic assignments
- Convert topic assignments for each document into K binary vectors X:
- Assume an MRF for each topic with X as underlying data
- Apply structure learning for MRF using regularized conditional likelihood

ICDM’07 HPDM workskop

Sparse Word Graphs

ICDM’07 HPDM workskop

Sparse Word Graphs: Scalability

- We still run V logistic regression problems, each of size V for each topic: O(KV2) !
- However, each example is very sparse
- L1 penalty results in sparse solutions
- Can run each topic in parallel
- Efficient interior point based L1 regularized logistic regression [Koh, Kim & Boyd, JMLR,’07]

ICDM’07 HPDM workskop

Experiments

- Small AP corpus
- 2.2K Docs, 10.5K unique words

- Ran 10 topic LDA model
- Used = 0.1 in L1 logistic regression
- Took just 45 min. per topic
- Very sparse solutions
- Computes only under 0.1% of the total number of possible edges

ICDM’07 HPDM workskop

Topic “Business”: neighborhood of top LDA terms

ICDM’07 HPDM workskop

Topic “Business”: neighborhood of top edges

ICDM’07 HPDM workskop

Topic “War”: neighborhood of top LDA terms

ICDM’07 HPDM workskop

Topic “War”: neighborhood of top edges

ICDM’07 HPDM workskop

Concluding remarks

- Pros
- A highly scalable algorithm for capturing within topic word correlations
- Captures both short distance and long distance correlations
- Makes topics more interpretable

- Cons
- Not a complete probabilistic model
- Significant modeling challenge since the correlations are latent

- Not a complete probabilistic model

ICDM’07 HPDM workskop

Concluding remarks

- Applications of Sparse Word Graphs
- Better document summarization and visualization tool
- Word sense disambiguation
- Semantic query expansion

- Future Work
- Evaluation on a “real task”
- Build a unified statistical model

ICDM’07 HPDM workskop

Download Presentation

Connecting to Server..