topic significance ranking for lda generative models
Download
Skip this Video
Download Presentation
Topic Significance Ranking for LDA Generative Models

Loading in 2 Seconds...

play fullscreen
1 / 17

Topic Significance Ranking for LDA Generative Models - PowerPoint PPT Presentation


  • 182 Views
  • Uploaded on

Topic Significance Ranking for LDA Generative Models. Loulwah AlSumait Daniel Barbará James Gentle Carlotta Domeniconi. ECML PKDD - Bled, Slovenia - September 7-11, 2009. Agenda. Introduction Junk/Insignificant topic definitions Distance measures

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Topic Significance Ranking for LDA Generative Models' - orli-nixon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
topic significance ranking for lda generative models

Topic Significance Ranking for LDA Generative Models

Loulwah AlSumait Daniel Barbará

James Gentle Carlotta Domeniconi

ECML PKDD - Bled, Slovenia - September 7-11, 2009

agenda
Agenda
  • Introduction
  • Junk/Insignificant topic definitions
  • Distance measures
  • 4-phase Weighted Combination Approach
  • Experimental results
  • Conclusions and future work
latent dirichlet allocation lda model blei ng jordan 2003

d

zi

Nd

D

Latent Dirichlet Allocation (LDA) ModelBlei, Ng, & Jordan (2003)
  • Exact inference is intractable
    • Approximation approaches
  • Input: K
  • Output: Φ, θ
  • Probabilistic generative model
  • Hidden variables (topics) are associated with the observed text
  • Dirichlet priors on document and topic distributions

Inference Process

Generative Process

K

wi

topic significance ranking
Topic Significance Ranking
  • Critical effect of the setting of K on the inferred topics
  • Most of previous work manually examine the topics
  • Quantify the semantic significance of topics
    • How much different is the topic distribution from junk/insignificant topic distributions
topic significance ranking1
Topic Significance Ranking
  • Example: 20 NewsGroup

The Volgenau School of Information Technology and Engineering

Department of Computer Science

junk insignificant topic definitions
Junk/Insignificant Topic Definitions
  • Uniform Distribution Over Words
    • Uniformity of a topic:
  • Vacuous Semantic Distribution
    • , p(wi|k) = ik ,
    • Vacuousness of a topic:
  • Background Distribution
    • Background of a topic: ,
distance measures
Distance Measures
  • Symmetric KL-Divergence
    • Uniformity, Background, W-Vacuous
  • Cosine Dissimilarity
    • Uniformity , W-Vacuous , Background
  • Coefficient Correlation
    • Uniformity , W-Vacuous , Background
topic significance ranking2
Topic Significance Ranking
  • Multi-Criteria Weighted Combination
  • 4 phases
    • Standardization procedure
      • Transfer distances into standardized measures
        • Scores
        • Weights
topic significance ranking3

B

U

V

B

V

U

S

S

S

S

S

S

1

1

1

2

2

2

k

k

k

k

k

k

W-Vacuous scores

Background scores

Topic Significance Ranking
  • 4 phases (Continued)
    • Intra-Criterion Weighted Combination
      • Combine standardized measures of each J/I definition
    • Inter-Criteria Weighted Combination
      • Combine J/I scores and weights
    • Topic Rank

Uniformity scores

TSR

X

conclusions and future work
Conclusions and Future Work
  • Unsupervised numerical quantification of the topics’ semantic Significance
  • Novel post analysis in LDA modeling
  • Three J/I topic distributions
  • 4 levels of weighted combination approach
  • Future directions:
    • Analysis of TSR sensitivity to the approach, K and weights settings
    • More J/I definitions
    • Tool to visualize topic evolution in online setting
ad