Topic significance ranking for lda generative models
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Topic Significance Ranking for LDA Generative Models PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on
  • Presentation posted in: General

Topic Significance Ranking for LDA Generative Models. Loulwah AlSumait Daniel Barbará James Gentle Carlotta Domeniconi. ECML PKDD - Bled, Slovenia - September 7-11, 2009. Agenda. Introduction Junk/Insignificant topic definitions Distance measures

Download Presentation

Topic Significance Ranking for LDA Generative Models

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Topic significance ranking for lda generative models

Topic Significance Ranking for LDA Generative Models

Loulwah AlSumait Daniel Barbará

James Gentle Carlotta Domeniconi

ECML PKDD - Bled, Slovenia - September 7-11, 2009


Agenda

Agenda

  • Introduction

  • Junk/Insignificant topic definitions

  • Distance measures

  • 4-phase Weighted Combination Approach

  • Experimental results

  • Conclusions and future work


Latent dirichlet allocation lda model blei ng jordan 2003

d

zi

Nd

D

Latent Dirichlet Allocation (LDA) ModelBlei, Ng, & Jordan (2003)

  • Exact inference is intractable

    • Approximation approaches

  • Input: K

  • Output: Φ, θ

  • Probabilistic generative model

  • Hidden variables (topics) are associated with the observed text

  • Dirichlet priors on document and topic distributions

Inference Process

Generative Process

K

wi


Topic significance ranking

Topic Significance Ranking

  • Critical effect of the setting of K on the inferred topics

  • Most of previous work manually examine the topics

  • Quantify the semantic significance of topics

    • How much different is the topic distribution from junk/insignificant topic distributions


Topic significance ranking1

Topic Significance Ranking

  • Example: 20 NewsGroup

The Volgenau School of Information Technology and Engineering

Department of Computer Science


Junk insignificant topic definitions

Junk/Insignificant Topic Definitions

  • Uniform Distribution Over Words

    • Uniformity of a topic:

  • Vacuous Semantic Distribution

    • , p(wi|k) = ik ,

    • Vacuousness of a topic:

  • Background Distribution

    • Background of a topic: ,


Distance measures

Distance Measures

  • Symmetric KL-Divergence

    • Uniformity, Background, W-Vacuous

  • Cosine Dissimilarity

    • Uniformity , W-Vacuous , Background

  • Coefficient Correlation

    • Uniformity , W-Vacuous , Background


Topic significance ranking2

Topic Significance Ranking

  • Multi-Criteria Weighted Combination

  • 4 phases

    • Standardization procedure

      • Transfer distances into standardized measures

        • Scores

        • Weights


Topic significance ranking3

B

U

V

B

V

U

S

S

S

S

S

S

1

1

1

2

2

2

k

k

k

k

k

k

W-Vacuous scores

Background scores

Topic Significance Ranking

  • 4 phases (Continued)

    • Intra-Criterion Weighted Combination

      • Combine standardized measures of each J/I definition

    • Inter-Criteria Weighted Combination

      • Combine J/I scores and weights

    • Topic Rank

Uniformity scores

TSR

X


Experimental results simulated data

Experimental Results: Simulated Data


20newsgroups top 10 significant topics

20NewsGroupsTop 10 significant topics


20newsgroups lowest 10 significant topics

20NewsGroupsLowest 10 significant topics


Nips top 10 significant topics

NIPSTop 10 Significant Topics


Nips lowest 10 significant topics

NIPSLowest 10 Significant Topics


Individual vs combined score

Individual vs. Combined Score

Simulated Data


Individual vs combined score1

Individual vs. Combined Score

20 NewsGroups


Conclusions and future work

Conclusions and Future Work

  • Unsupervised numerical quantification of the topics’ semantic Significance

  • Novel post analysis in LDA modeling

  • Three J/I topic distributions

  • 4 levels of weighted combination approach

  • Future directions:

    • Analysis of TSR sensitivity to the approach, K and weights settings

    • More J/I definitions

    • Tool to visualize topic evolution in online setting


  • Login