Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models - PowerPoint PPT Presentation

Sparse word graphs a scalable algorithm for capturing word correlations in topic models l.jpg
Download
1 / 28

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William Cohen and Eric Xing Machine Learning Department Carnegie Mellon University Introduction

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sparse word graphs a scalable algorithm for capturing word correlations in topic models l.jpg

Sparse Word Graphs:A Scalable Algorithm for Capturing Word Correlations in Topic Models

Ramesh NallapatiJoint work with

John Lafferty, Amr Ahmed,

William Cohen and Eric Xing

Machine Learning Department

Carnegie Mellon University


Introduction l.jpg

Introduction

  • Statistical topic modeling: an attractive framework for topic discovery

    • Completely unsupervised

    • Models text very well

      • Lower perplexity compared to unigram models

    • Reveals meaningful semantic patterns

    • Can help summarize and visualize document collections

    • e.g.: PLSA, LDA, DPM, DTM, CTM, PA

ICDM’07 HPDM workskop


Introduction3 l.jpg

Introduction

  • A common assumption in all the variants:

    • Exchangeability: “bag of words” assumption

    • Topics represented as a ranked list of words

  • Consequences:

    • Word Correlation information is lost

      • e.g.: “white-house” vs. “white” and “house”

      • Long distance correlations

ICDM’07 HPDM workskop


Introduction4 l.jpg

Introduction

  • Objective:

    • To capture correlations between words within topics

  • Motivation:

    • More interpretable representation of topics as a network of words rather than a list

    • Helps better visualize and summarize document collections

    • May reveal unexpected relationships and patterns within topics

ICDM’07 HPDM workskop


Past work topic models l.jpg

Past Work: Topic Models

  • Bigram topic models[Wallach, ICML 2006]

  • Requires KV(K-1) parameters

  • Only captures local dependencies

  • Does not model sparsity of correlations

  • Does not capture “within-topic” correlations

ICDM’07 HPDM workskop


Past work other approaches l.jpg

Past work: Other approaches

  • Hyperspace Analog to Language (HAL)

    [Lund and Burges, Cog. Sci., ‘96]

    • Word pair correlation measured as a weighted count of number of times they occur within a fixed length window

    • Weight of an occurrence / 1/(mutual distance)

ICDM’07 HPDM workskop


Past work other approaches7 l.jpg

Past work: Other approaches

  • Hyperspace Analog to Language (HAL)

    [Lund and Burges, Cog. Sci., ‘96]

    • Plusses:

      • Sparse solutions, scalability

    • Minuses:

      • Only unearths global correlations, not semantic correlations

        • E.g.: “river – bank”, “bank – check”

      • Only local dependencies

ICDM’07 HPDM workskop


Past work other approaches8 l.jpg

Past work: Other approaches

  • Query expansion in IR

    • Similar in spirit: finds words that highly co-occur with the query words

    • However, not a corpus visualization tool: requires a context to operate on

  • Wordnet

    • Semantic networks

    • Human labeled: not directly related to our goal

ICDM’07 HPDM workskop


Our approach l.jpg

Our approach

  • L1 norm regularization

    • Known to enforce sparse solutions

      • Sparsity permits scalability

    • Convex optimization problem

      • Globally optimal solutions

    • Recent advances in learning structure of graphical models:

      • L1 regularization framework asymptotically leads to true structure

ICDM’07 HPDM workskop


Background lasso l.jpg

Background:LASSO

  • Example: linear regression

  • Regularization used to improve generalizability

    • E.g.1: Ridge regression: L2 norm regularization

    • E.g.2: Lasso: L1 norm regularization

ICDM’07 HPDM workskop


Background lasso11 l.jpg

Background: LASSO

  • Lasso encourages sparse solutions

ICDM’07 HPDM workskop


Background gaussian random fields l.jpg

Background: Gaussian Random Fields

  • Multivariate Gaussian distribution

  • Random field structure: G = (V,E)

    • V: set of all variables {X1,,Xp}

    • (s,t) 2 E ,-1st 0

    • Xs? Xu | XN(s) where u  N(s)

ICDM’07 HPDM workskop


Background gaussian random fields13 l.jpg

Background: Gaussian Random Fields

  • Estimating the graph structure of GRF from data [Meinshausen and Buhlmann, Annals. Stats., 2006]

    • Regress each variable onto others imposing L1 penalty to encourage sparsity

    • Estimated neighborhood:

ICDM’07 HPDM workskop


Background gaussian random fields14 l.jpg

Background: Gaussian Random Fields

Estimated graph

True Graph

Courtesy: [Meinshausen and Buhlmann, Annals. Stats., 2006]

ICDM’07 HPDM workskop


Background gaussian random fields15 l.jpg

Background: Gaussian Random Fields

  • Application to topic models: CTM

    [Blei and Lafferty, NIPS, 2006]

ICDM’07 HPDM workskop


Background gaussian random fields16 l.jpg

Background: Gaussian Random Fields

  • Application to CTM:[Blei & Lafferty, Annals. Appl. Stats., ‘07]

ICDM’07 HPDM workskop


Structure learning of an mrf l.jpg

Structure learning of an MRF

  • Ising model

  • L1 regularized conditional likelihood learns true structure asymptotically

    [Wainwright, Ravikumar and Lafferty, NIPS’06]

ICDM’07 HPDM workskop


Structure learning of an mrf18 l.jpg

Structure learning of an MRF

Courtesy: [Wainwright, Ravikumar and Lafferty, NIPS’06]

ICDM’07 HPDM workskop


Sparse word graphs l.jpg

Sparse Word Graphs

  • Algorithm

    • Run LDA on the document collection and obtain topic assignments

    • Convert topic assignments for each document into K binary vectors X:

    • Assume an MRF for each topic with X as underlying data

    • Apply structure learning for MRF using regularized conditional likelihood

ICDM’07 HPDM workskop


Sparse word graphs20 l.jpg

Sparse Word Graphs

ICDM’07 HPDM workskop


Sparse word graphs scalability l.jpg

Sparse Word Graphs: Scalability

  • We still run V logistic regression problems, each of size V for each topic: O(KV2) !

    • However, each example is very sparse

    • L1 penalty results in sparse solutions

    • Can run each topic in parallel

    • Efficient interior point based L1 regularized logistic regression [Koh, Kim & Boyd, JMLR,’07]

ICDM’07 HPDM workskop


Experiments l.jpg

Experiments

  • Small AP corpus

    • 2.2K Docs, 10.5K unique words

  • Ran 10 topic LDA model

  • Used  = 0.1 in L1 logistic regression

  • Took just 45 min. per topic

  • Very sparse solutions

    • Computes only under 0.1% of the total number of possible edges

ICDM’07 HPDM workskop


Topic business neighborhood of top lda terms l.jpg

Topic “Business”: neighborhood of top LDA terms

ICDM’07 HPDM workskop


Topic business neighborhood of top edges l.jpg

Topic “Business”: neighborhood of top edges

ICDM’07 HPDM workskop


Topic war neighborhood of top lda terms l.jpg

Topic “War”: neighborhood of top LDA terms

ICDM’07 HPDM workskop


Topic war neighborhood of top edges l.jpg

Topic “War”: neighborhood of top edges

ICDM’07 HPDM workskop


Concluding remarks l.jpg

Concluding remarks

  • Pros

    • A highly scalable algorithm for capturing within topic word correlations

    • Captures both short distance and long distance correlations

    • Makes topics more interpretable

  • Cons

    • Not a complete probabilistic model

      • Significant modeling challenge since the correlations are latent

ICDM’07 HPDM workskop


Concluding remarks28 l.jpg

Concluding remarks

  • Applications of Sparse Word Graphs

    • Better document summarization and visualization tool

    • Word sense disambiguation

    • Semantic query expansion

  • Future Work

    • Evaluation on a “real task”

    • Build a unified statistical model

ICDM’07 HPDM workskop


  • Login