discovering evolutionary theme patterns from text an exploration of temporal text mining l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining PowerPoint Presentation
Download Presentation
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining

Loading in 2 Seconds...

play fullscreen
1 / 27

Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining - PowerPoint PPT Presentation


  • 327 Views
  • Uploaded on

Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining. KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei Department of Computer Science University of Illinois at Urbana Champaig. ChengXiang Zhai Department of Computer Science

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining' - paul2


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
discovering evolutionary theme patterns from text an exploration of temporal text mining

Discovering Evolutionary Theme Patterns from Text-An exploration of Temporal Text Mining

KDD’05, August 21–24, 2005, Chicago, Illinois, USA.

Qiaozhu Mei

Department of Computer Science

University of Illinois at Urbana Champaig

ChengXiang Zhai

Department of Computer Science

University of Illinois at Urbana Champaign

abstract
Abstract
  • Temporal Text Mining (TTM): concerned with discovering temporal patterns in text information collected over time.
  • discovering latent themes from text
  • constructing an evolution graph of themes
  • analyzing life cycles of themes
agenda
Agenda

1. Introduction

2. Problem formulation

3. Evolution graph discovery

4. Analysis of theme life cycle

5. Experiments and result

6. Related work

7. Conclusions

agenda4
Agenda

1. Introduction

2. Problem formulation

3. Evolution graph discovery

4. Analysis of theme life cycle

5. Experiments and result

6. Related work

7. Conclusions

introduction
INTRODUCTION
  • Fell interesting in subtopics characterizing the beginning, progression, and impact of the event, among others.
introduction cont d
INTRODUCTION (cont’d)
  • discovering latent themes from text
  • discovering theme evolutionary relations and constructing an evolution graph of themes
  • modeling theme strength over time andanalyzing the life cycles of themes
agenda7
Agenda

1. Introduction

2. Problem formulation

3. Evolution graph discovery

4. Analysis of theme life cycle

5. Experiments and result

6. Related work

7. Conclusions

agenda14
Agenda

1. Introduction

2. Problem formulation

3. Evolution graph discovery

4. Analysis of theme life cycle

5. Experiments and result

6. Related work

7. Conclusions

3 1 theme extraction
3.1 THEME EXTRACTION
  • Probabilitic mixture model
  • Expectation Maximization algorithm
3 2 evolution transition
3.2 EVOLUTION TRANSITION

Kullback-Leibler divergence(distance):

agenda21
Agenda

1. Introduction

2. Problem formulation

3. Evolution graph discovery

4. Analysis of theme life cycle

5. Experiments and result

6. Related work

7. Conclusions

theme life cycle cont d
THEME LIFE CYCLE (cont’d)
  • Definition 6 (Theme Life Cycle)

Given a text collection tagged with time stamps and a set of trans-collection themes, we define the Theme Life Cycle of each theme as the strength distribution of the theme over the entire time line. The strength of a theme at each time period is measured by the number of words generated by this theme in the documents corresponding to this time period, normalized by either the number of time points (giving an absolute strength), or the total number of words in the period (giving a relative strength). The absolute strength measures the absolute amount of text which a theme can explain, while the relative strength indicates which theme is relatively stronger in a time period.

theme life cycle cont d23
THEME LIFE CYCLE (cont’d)
  • Hidden Markov Model (HMM).
theme life cycle cont d24
THEME LIFE CYCLE (cont’d)
  • four steps:
  • (1) Construct an HMM to model how themes shift between each other in the collection.
  • (2) Estimate the unknown parameters of the HMM using the whole stream collection as observed example sequence.
  • (3) Decode the collection and label each word with the hidden theme model from which it is generated.
  • (4) For each trans-collection theme, analyze when it starts, when it terminates, and how it varies over time
theme life cycle cont d25
THEME LIFE CYCLE (cont’d)
  • first extract k trans-collection themes from the collection, then construct a fully connected HMM with k + 1 states. The entire vocabulary V is taken as the output symbol set, and the output probability distribution of each state is set to the multinomial distribution of words given by the corresponding theme language model.
agenda27
Agenda

1. Introduction

2. Problem formulation

3. Evolution graph discovery

4. Analysis of theme life cycle

5. Experiments and result

6. Related work

7. Conclusions