Topic significance ranking for lda generative models
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Topic Significance Ranking for LDA Generative Models PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on
  • Presentation posted in: General

Topic Significance Ranking for LDA Generative Models. Loulwah AlSumait Daniel Barbará James Gentle Carlotta Domeniconi. ECML PKDD - Bled, Slovenia - September 7-11, 2009. Agenda. Introduction Junk/Insignificant topic definitions Distance measures

Download Presentation

Topic Significance Ranking for LDA Generative Models

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Topic Significance Ranking for LDA Generative Models

Loulwah AlSumait Daniel Barbará

James Gentle Carlotta Domeniconi

ECML PKDD - Bled, Slovenia - September 7-11, 2009


Agenda

  • Introduction

  • Junk/Insignificant topic definitions

  • Distance measures

  • 4-phase Weighted Combination Approach

  • Experimental results

  • Conclusions and future work


d

zi

Nd

D

Latent Dirichlet Allocation (LDA) ModelBlei, Ng, & Jordan (2003)

  • Exact inference is intractable

    • Approximation approaches

  • Input: K

  • Output: Φ, θ

  • Probabilistic generative model

  • Hidden variables (topics) are associated with the observed text

  • Dirichlet priors on document and topic distributions

Inference Process

Generative Process

K

wi


Topic Significance Ranking

  • Critical effect of the setting of K on the inferred topics

  • Most of previous work manually examine the topics

  • Quantify the semantic significance of topics

    • How much different is the topic distribution from junk/insignificant topic distributions


Topic Significance Ranking

  • Example: 20 NewsGroup

The Volgenau School of Information Technology and Engineering

Department of Computer Science


Junk/Insignificant Topic Definitions

  • Uniform Distribution Over Words

    • Uniformity of a topic:

  • Vacuous Semantic Distribution

    • , p(wi|k) = ik ,

    • Vacuousness of a topic:

  • Background Distribution

    • Background of a topic: ,


Distance Measures

  • Symmetric KL-Divergence

    • Uniformity, Background, W-Vacuous

  • Cosine Dissimilarity

    • Uniformity , W-Vacuous , Background

  • Coefficient Correlation

    • Uniformity , W-Vacuous , Background


Topic Significance Ranking

  • Multi-Criteria Weighted Combination

  • 4 phases

    • Standardization procedure

      • Transfer distances into standardized measures

        • Scores

        • Weights


B

U

V

B

V

U

S

S

S

S

S

S

1

1

1

2

2

2

k

k

k

k

k

k

W-Vacuous scores

Background scores

Topic Significance Ranking

  • 4 phases (Continued)

    • Intra-Criterion Weighted Combination

      • Combine standardized measures of each J/I definition

    • Inter-Criteria Weighted Combination

      • Combine J/I scores and weights

    • Topic Rank

Uniformity scores

TSR

X


Experimental Results: Simulated Data


20NewsGroupsTop 10 significant topics


20NewsGroupsLowest 10 significant topics


NIPSTop 10 Significant Topics


NIPSLowest 10 Significant Topics


Individual vs. Combined Score

Simulated Data


Individual vs. Combined Score

20 NewsGroups


Conclusions and Future Work

  • Unsupervised numerical quantification of the topics’ semantic Significance

  • Novel post analysis in LDA modeling

  • Three J/I topic distributions

  • 4 levels of weighted combination approach

  • Future directions:

    • Analysis of TSR sensitivity to the approach, K and weights settings

    • More J/I definitions

    • Tool to visualize topic evolution in online setting


  • Login