# Automatic Summary Evaluation - PowerPoint PPT Presentation

Automatic Summary Evaluation

1 / 13

## Automatic Summary Evaluation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Automatic Summary Evaluation Ross Greenwood

2. Recap • Automatically evaluate summaries of text documents • Evaluate content coverage • Compare against one or more ideal summaries

3. Pyramid Evaluation • Manually annotate texts for phrases expressing similar ideas (summary content units) • Judge content coverage by number of overlapping summary content units

4. ROUGE: Four Summary Evaluation Measures • ROUGE-N: N-gram Co-Occurrence • Number of matching N-word substrings • ROUGE-L: Longest Common Subsequence • Allows for skipping words • Ex. “a b d f” is a subsequence of “a b c d e f” • ROUGE-W: Weighted LCS • Weight consecutive matches higher • ROUGE-S: Skip-bigram • Number of matching 2-word substrings with arbitrary gaps

5. ROUGE: Four Summary Evaluation Measures • ROUGE-N: N-gram Co-Occurrence • Number of matching N-word substrings • ROUGE-L: Longest Common Subsequence • Allows for skipping words • Ex. “a b d f” is a subsequence of “a b c d e f” • ROUGE-W: Weighted LCS • Weight consecutive matches higher • ROUGE-S: Skip-bigram • Number of matching 2-word substrings with arbitrary gaps

6. Precision, Recall, and F-Measure • Precision = matches/num_words_peer • Recall = matches/num_words_models • F = 2/(1/P + 1/R)

7. Problems with ROUGE-N: False Positives • Homographs, ex: Model: … robbed the bank … Peer: … sat on the river bank …

8. Problems with ROUGE-N: False Negatives • Synonyms, ex: Model: … held up the financial institution … Peer: … robbed the bank …

9. Solution: WordNet • Lexical Database • Synsets: organize words by concepts • Method: • Tag words with POS • Tag words with meaning (senseLearner) • Lookup synset in WordNet

10. Architecture of Solution WordNet {go#v#7, pass#v#6, lead#v#6, extend#v#2} querySense(“run#v#3”, “syns”) POS tagger senseLearner ROUGE Results Data

11. Evaluating the Evaluator • Correlation with human evaluation scores (ROUGE, Basic Elements) • Success at reducing errors (i.e. number of false negatives/positives avoided vs. original ROUGE)

12. References • Lin, C.Y. (2004). Rouge: a package for automatic evaluation of summaries. Workshop On Text Summarization Branches Out • Fellbaum, C. (Ed.). (1998). Wordnet: an electronic lexical database. Cambridge, MA: MIT Press.

13. Questions?