Topic significance ranking for lda generative models
Download
1 / 17

Topic Significance Ranking for LDA Generative Models - PowerPoint PPT Presentation


  • 175 Views
  • Uploaded on

Topic Significance Ranking for LDA Generative Models. Loulwah AlSumait Daniel Barbará James Gentle Carlotta Domeniconi. ECML PKDD - Bled, Slovenia - September 7-11, 2009. Agenda. Introduction Junk/Insignificant topic definitions Distance measures

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Topic Significance Ranking for LDA Generative Models' - orli-nixon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Topic significance ranking for lda generative models

Topic Significance Ranking for LDA Generative Models

Loulwah AlSumait Daniel Barbará

James Gentle Carlotta Domeniconi

ECML PKDD - Bled, Slovenia - September 7-11, 2009


Agenda
Agenda

  • Introduction

  • Junk/Insignificant topic definitions

  • Distance measures

  • 4-phase Weighted Combination Approach

  • Experimental results

  • Conclusions and future work


Latent dirichlet allocation lda model blei ng jordan 2003

d

zi

Nd

D

Latent Dirichlet Allocation (LDA) ModelBlei, Ng, & Jordan (2003)

  • Exact inference is intractable

    • Approximation approaches

  • Input: K

  • Output: Φ, θ

  • Probabilistic generative model

  • Hidden variables (topics) are associated with the observed text

  • Dirichlet priors on document and topic distributions

Inference Process

Generative Process

K

wi


Topic significance ranking
Topic Significance Ranking

  • Critical effect of the setting of K on the inferred topics

  • Most of previous work manually examine the topics

  • Quantify the semantic significance of topics

    • How much different is the topic distribution from junk/insignificant topic distributions


Topic significance ranking1
Topic Significance Ranking

  • Example: 20 NewsGroup

The Volgenau School of Information Technology and Engineering

Department of Computer Science


Junk insignificant topic definitions
Junk/Insignificant Topic Definitions

  • Uniform Distribution Over Words

    • Uniformity of a topic:

  • Vacuous Semantic Distribution

    • , p(wi|k) = ik ,

    • Vacuousness of a topic:

  • Background Distribution

    • Background of a topic: ,


Distance measures
Distance Measures

  • Symmetric KL-Divergence

    • Uniformity, Background, W-Vacuous

  • Cosine Dissimilarity

    • Uniformity , W-Vacuous , Background

  • Coefficient Correlation

    • Uniformity , W-Vacuous , Background


Topic significance ranking2
Topic Significance Ranking

  • Multi-Criteria Weighted Combination

  • 4 phases

    • Standardization procedure

      • Transfer distances into standardized measures

        • Scores

        • Weights


Topic significance ranking3

B

U

V

B

V

U

S

S

S

S

S

S

1

1

1

2

2

2

k

k

k

k

k

k

W-Vacuous scores

Background scores

Topic Significance Ranking

  • 4 phases (Continued)

    • Intra-Criterion Weighted Combination

      • Combine standardized measures of each J/I definition

    • Inter-Criteria Weighted Combination

      • Combine J/I scores and weights

    • Topic Rank

Uniformity scores

TSR

X



20newsgroups top 10 significant topics
20NewsGroupsTop 10 significant topics


20newsgroups lowest 10 significant topics
20NewsGroupsLowest 10 significant topics


Nips top 10 significant topics
NIPSTop 10 Significant Topics


Nips lowest 10 significant topics
NIPSLowest 10 Significant Topics




Conclusions and future work
Conclusions and Future Work

  • Unsupervised numerical quantification of the topics’ semantic Significance

  • Novel post analysis in LDA modeling

  • Three J/I topic distributions

  • 4 levels of weighted combination approach

  • Future directions:

    • Analysis of TSR sensitivity to the approach, K and weights settings

    • More J/I definitions

    • Tool to visualize topic evolution in online setting


ad