1 / 23

Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization

Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization. (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19, 2008 Hyun Duk Kim, Dae Hoon Park, V.G.Vinod Vydiswaran, ChengXiang Zhai Department of Computer Science

orde
Download Presentation

Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opinion Summarization Using Entity Features and ProbabilisticSentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19, 2008 Hyun Duk Kim, Dae Hoon Park, V.G.Vinod Vydiswaran, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

  2. Research Questions • Can we improve sentence retrieval by assigning more weights to entity terms? • Can we optimize the coherence of a summary using a statistical coherence model?

  3. General Approach Target1001 Step 1 Relevant Sentences Sentence Retrieval Q1 Q2 … Query 1 Step 2 Query 2 Doc Sentence Filtering ? Step 3 Opinionated Relevant Sentences Sentence Organization Opinion Summary

  4. Step 1: Sentence Retrieval • Uniform Term Weighting Target1001 Indri Toolkit Relevant Sentences • Non-Uniform Weighting Question i • Named Entity: 10 • Noun phrase : 2 • Others : 1 Doc

  5. Step 2: Sentence Filtering Keep only Opinionated Sentences Relevant Sentences Keep only the same polarity Remove redundancy S1’ S2’ S3’ S1’ S2’ S4’ S6’ S7’ S4’ S1’ S2’ S3’ S1’ S2’ S4’ S6’ S7’ S4’ S1’ S2’ S3’ S1’ S2’ S4’ S6’ S7’ S4’ S1’ S2’ S3’ S1’ S2’ S4’ S6’ S7’ S4’

  6. Step 3: Summary Organization (Method 1: Polarity Ordering) • Paragraph structure by question and polarity • Add guiding phrase The first question is … Following are positive opinions… Following are negative opinions… The second question is … Following are mixed opinions… …

  7. Step 3: Summary Organization (Method 2: Statistical Coherence Optimization) S1 S1’ c(S1’, S2’) S2 S2’ c(S2’, S3’) S3 S3’ … … … c(Sn-1’, Sn’) Sn Sn’ + • Coherence function: c(Si, Sj) • Use a greedy algorithm to order sentences to maximize the total score • c(S1’, S2’)+c(S1’, S2’)+…+c(Sn-1’, Sn’) 7

  8. Probabilistic Coherence Function(Idea similar to [Lapata 03]) Average coherence probability (Pointwise mutual information)over all word combinations Train with original document Sentence 1 u v u Sentence 2 v v u v Yes No No No Yes P(u,v) = (2+0.001)/(3*4+1.0) 8

  9. General Approach Target1001 Step 1 Relevant Sentences Sentence Retrieval Q1 Q2 … Query 1 Step 2 Query 2 Doc Sentence Filtering ? Step 3 Opinionated Relevant Sentences Sentence Organization Opinion Summary

  10. Submissions: UIUC1, UIUC2 Target1001 Step 1 Relevant Sentences Sentence Retrieval UIUC1: non-uniform weighting UIUC2: uniform weighting Q1 Q2 … Query 1 Step 2 Query 2 Doc Sentence Filtering UIUC1: Aggressive polarity filtering UIUC2: Conservative filtering ? Step 3 Opinionated Relevant Sentences UIUC1: Polarity ordering UIUC2: Statistical ordering Sentence Organization Opinion Summary

  11. Evaluation • Rank among runs without answer-snippet (Total: 19 runs)

  12. Evaluation Polarity ordering NE/NP retrieval, Polarity filtering • Rank among runs without answer-snippet (Total: 19 runs) Nothing Statistical ordering

  13. Evaluation of Named Entity Weighting Assume a sentence is relevant iff similarity(sentence, nugget description) > threshold • Uniform Term Weighting Target1001 Indri Toolkit Relevant Sentences • Non-Uniform Weighting • Named entity: 10 • Noun phrase : 2 • Others : 1 Question i Doc

  14. Effectiveness of Entity Weighting 10-2-1 Weighting 1-1-1 Weighting 14

  15. Polarity Module (Unit: # of sentence) Polarity module performance evaluation on the sentiment corpus. [Hu&Liu 04, Hu&Liu 04b] 15

  16. Coherence optimization • Evaluation methods • Basic assumption • the sentence order of original document is coherent • Among given target documents,use 70% as training set, 30% as test set. • Measurement: strict pair matching • # of correct sentence pair / # of total adjacent sentence pair 16

  17. Probabilistic Coherence Function Average coherence probability over all word combinations Point-wise Mutual information with smoothing Strict joint probability 17

  18. Probabilistic Coherence Function Mutual information where, N = c(u,v)+c(not u, v)+c(u, not v)+c(not u, not v) For unseen pairs, p(u,v)=0.5*MIN(seen pairs in training) 18

  19. Coherence optimization test • Pointwise mutual information effectively penalize common words

  20. Coherence optimization test • Top ranked p(u,v) of strict joint probability • A lot of stopwords are top-ranked. 20

  21. Coherence optimization test • Pointwise Mutual information was better than joint probability and normal mutual information. • Eliminating common words, very rare words improved performance

  22. Conclusions • Limited improvement inretrieval performance using named entity and noun phrase • Need for a good polarity classification module • Possibility on the improvement of statistical sentence ordering module with different coherence function and word selection

  23. Thank you University of Illinois at Urbana-ChampaignHyun Duk Kim (hkim277@uiuc.edu)

More Related