1 / 7

Novelty Detection in Repeated MEAD Summarization

Novelty Detection in Repeated MEAD Summarization. Richard Murphy EECS 597 06 December 2002. The Problem with MEAD. Works well for one-time summaries Summaries produced are readable, fairly informative News stories are on-going, not one-time

bela
Download Presentation

Novelty Detection in Repeated MEAD Summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

  2. The Problem with MEAD • Works well for one-time summaries • Summaries produced are readable, fairly informative • News stories are on-going, not one-time • New, relevant articles may appear after cluster is summarized • Expanded cluster will include new information • Second summary of a cluster will include lots of known information • New information often demoted--further from centroid • Repeated summaries lose value • Reader can be assumed to remember past summaries • Most informative summary will focus on new information with only brief repetition of key points • More repetition = Less new information = Less useful summary

  3. [1] CNN.com - Plane hits skyscraper in Milan - April 18, 2002 [2] CNNenEspanol.com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30-story building on fire, an Italian journalist told CNN. [3] The crash by the Piper tourist plane into the 26th floor occurred at 5:50 p.m. (1450 GMT) on Thursday, said journalist Desideria Cavina. [4] Several storeys of the building were engulfed in fire, she said. [5] Italian TV says the crash put a hole in the 25th floor of the Pirelli building, and that smoke is pouring from the opening. [6] U.N. envoy horror at Jenin camp U.S. bombing kills Canadians Chinese missiles concern U.S. 2002 Cable News Network LP, LLLP. [7] The building houses government offices and is next to the city's central train station. [1] CNN.com - Plane hits skyscraper in Milan - April 18, 2002 [2] The crash by the Piper tourist plane into the 26th floor occurred at 5:50 p.m. (1450 GMT) on Thursday, said journalist Desideria Cavina. [3] The building houses government offices and is next to the city's central train station. [4] Italian TV says the crash put a hole in the 25th floor of the Pirelli building, and that smoke is pouring from the opening. [5] U.N. envoy horror at Jenin camp U.S. bombing kills Canadians Chinese missiles concern U.S. 2002 Cable News Network LP, LLLP. [6] The Pirelli Building in Milan, Italy, was hit by a small plane. [7] (ABCNEWS.com) 8212; A small plane crashed into a skyscraper in downtown Milan today, setting several floors of the 30-story building on fire. [8] The plane crashed into the 25th floor of the Pirelli building in downtown Milan. [9] A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. [10] WITNESSES REPORTED hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city s central train station. [11] Italian state television said the crash put a hole in the 25th floor of the Pirelli building. [12] CNNenEspanol.com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30-story building on fire, an Italian journalist told CNN.

  4. Solution: MEAD with a memory • Save summaries with cluster information • When summarizing cluster in future, check for archived summaries • During reranking, compare sentences to sentences in old summaries • Existing default-reranker.pl module compares sentences in summary to each other using cosine similarity metric, eliminates those that are too similar to other sentences in the summary • After this process, use cosine similarity to demote sentences in new summary that are too similar to sentences in old summary • Don’t completely eliminate sentences similar to known information--If user requests large enough summary, “background” (already seen) information should appear lower in new summary • User specific • In a MEAD-based system like NewsInEssence, users could log in to get updated summaries of on-going stories

  5. Evaluating Multiple Summaries • Evaluation of single (first) summary • Create manual extract from current cluster • Run meadeval.pl to calculate precision/recall/kappa of automated summary • Evaluation of subsequent summaries • Create manual extract from current cluster and past automated summaries (not past manual summaries--reader will have seen the automated output) • Run meadeval.pl • Always use the cluster which was available to MEAD at time of automated summarization

  6. Default MEAD--Initial summary: Precision: 0.571428571428571 Recall: 0.571428571428571 Kappa: 0.539170506912442 Default MEAD--Second summary: Precision: 0.25 Recall: 0.25 Kappa: 0.147727272727273 Default MEAD--Third summary: Precision: 0.0833333333333333 Recall: 0.0833333333333333 Kappa: -0.0416666666666663 MEAD with memory--Initial: Precision: 0.571428571428571 Recall: 0.571428571428571 Kappa: 0.539170506912442 MEAD with memory--Second: Precision: 0.333333333333333 Recall: 0.333333333333333 Kappa: 0.242424242424242 MEAD with memory--Third: Precision: 0.833333333333333 Recall: 0.833333333333333 Kappa: 0.81060606060606 Settings: demote on cosine-similarity >= 0.7, demote by 0.1 points Comparing MEAD to MEAD with memory

  7. Remaining / Future Work • More testing • More test clusters • Different values of demotion increment, demotion similarity cutoff • Command-line options for demotion settings • Varying levels of demotion based on position in old summary • Multiple users • Currently assumes cluster belongs to an individual user • Add command-line identification of user so that multiple users can summarize cluster without being affected by each others’ archives • News in Essence interface • Remember website visitors, keep unique archives for each

More Related