1 / 19

Automatic Keyphrase Extraction via Topic Decomposition

Automatic Keyphrase Extraction via Topic Decomposition. Presenter : Wu, Min-Cong Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

Download Presentation

Automatic Keyphrase Extraction via Topic Decomposition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Keyphrase Extraction via Topic Decomposition Presenter: Wu, Min-Cong Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun2010, ACM

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments 1

  3. Motivation • Existing graph-basedranking methods forkeyphraseextraction just compute a single importancescore for each word via a single randomwalk. • Motivated by the fact that both documents and words can be represented by a mixture of semantic topics. 2

  4. Objectives • We thus build a Topical PageRank (TPR) on word graph to measure word importance with respect to different topics. • we further calculate the ranking scores of words and extract the top ranked ones as keyphrases. 3

  5. θ Methodology-Building Topic Interpreters Pr(w|z) ∈ ϕ(z)∈ ϕ Pr(z|d) ∈θ(d)∈ θ α, β from: ex: Gibbs sampling Topic-word Document-topic LDA output: 1

  6. Methodology- Topical PageRank for KeyphraseExtraction 1

  7. Methodology- Constructing Word Graph Slide window size = 3 The document is regarded as a word sequence 1

  8. Methodology- Topical PageRank(PageRank) Define: weight of link (wi,wj) as e(wi,wj) 1

  9. Methodology- Topical PageRank(PageRank) equal probabilities of random jump to all vertices. out-degree of vertex 1

  10. Methodology- Topical PageRank From LDA =pr(w)*pr(z)/pr(z) focuses on word =pr(z)*pr(w)/pr(w) focuses on topic (Cohn and Chang, 2000). 1

  11. Methodology- Extract Keyphrases Using Ranking Scores Step1. annotate the document with POS tags. Step2. select noun phrases. Step3. compute the ranking scores of candidate keyphrasesseparately for each topic. Topic PageRank PageRank Step4. integrate topic-specific rankings of candidate keyphrasesinto a final ranking. 1

  12. Experiment-Datasets Dataset: Topic model: build topic interpreters with LDA. 1

  13. Experiment-Evaluation Metrics However,precision/recall/F-measure does not take the orderof extracted keyphrases into account. The large value is better than small values. The values is between 0 and 1. 1

  14. Experiment-Influences of Parameters to TPR Window Size W The Number of Topics K 1

  15. Experiment-Influences of Parameters to TPR Damping Factor λ =pr(w)*pr(z)/pr(z) focuses on word Preference Values =pr(z)*pr(w)/pr(w) focuses on topic Ex.he、she 1

  16. Experiment-Comparing with Baseline Methods do not use topic information TPR enjoys the advantages of both LDA and TFIDF/PageRank 1

  17. Experiment-Extracting Example 1

  18. Conclusions • Experiments on two datasets show that TPR achieves better performance than other baseline methods. 1

  19. Comments • Advantages • TPR incorporates topic information within random walk for keyphrase extraction. • Applications • Automatic Keyphrase Extraction. 1

More Related