1 / 13

Automatic Summary Evaluation

Automatic Summary Evaluation. Ross Greenwood. Recap. Automatically evaluate summaries of text documents Evaluate content coverage Compare against one or more ideal summaries. Pyramid Evaluation. Manually annotate texts for phrases expressing similar ideas (summary content units)

senwe
Download Presentation

Automatic Summary Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Summary Evaluation Ross Greenwood

  2. Recap • Automatically evaluate summaries of text documents • Evaluate content coverage • Compare against one or more ideal summaries

  3. Pyramid Evaluation • Manually annotate texts for phrases expressing similar ideas (summary content units) • Judge content coverage by number of overlapping summary content units

  4. ROUGE: Four Summary Evaluation Measures • ROUGE-N: N-gram Co-Occurrence • Number of matching N-word substrings • ROUGE-L: Longest Common Subsequence • Allows for skipping words • Ex. “a b d f” is a subsequence of “a b c d e f” • ROUGE-W: Weighted LCS • Weight consecutive matches higher • ROUGE-S: Skip-bigram • Number of matching 2-word substrings with arbitrary gaps

  5. ROUGE: Four Summary Evaluation Measures • ROUGE-N: N-gram Co-Occurrence • Number of matching N-word substrings • ROUGE-L: Longest Common Subsequence • Allows for skipping words • Ex. “a b d f” is a subsequence of “a b c d e f” • ROUGE-W: Weighted LCS • Weight consecutive matches higher • ROUGE-S: Skip-bigram • Number of matching 2-word substrings with arbitrary gaps

  6. Precision, Recall, and F-Measure • Precision = matches/num_words_peer • Recall = matches/num_words_models • F = 2/(1/P + 1/R)

  7. Problems with ROUGE-N: False Positives • Homographs, ex: Model: … robbed the bank … Peer: … sat on the river bank …

  8. Problems with ROUGE-N: False Negatives • Synonyms, ex: Model: … held up the financial institution … Peer: … robbed the bank …

  9. Solution: WordNet • Lexical Database • Synsets: organize words by concepts • Method: • Tag words with POS • Tag words with meaning (senseLearner) • Lookup synset in WordNet

  10. Architecture of Solution WordNet {go#v#7, pass#v#6, lead#v#6, extend#v#2} querySense(“run#v#3”, “syns”) POS tagger senseLearner ROUGE Results Data

  11. Evaluating the Evaluator • Correlation with human evaluation scores (ROUGE, Basic Elements) • Success at reducing errors (i.e. number of false negatives/positives avoided vs. original ROUGE)

  12. References • Lin, C.Y. (2004). Rouge: a package for automatic evaluation of summaries. Workshop On Text Summarization Branches Out • Fellbaum, C. (Ed.). (1998). Wordnet: an electronic lexical database. Cambridge, MA: MIT Press.

  13. Questions?

More Related