I256 Applied Natural Language Processing Fall 2009

I256 Applied Natural Language ProcessingFall 2009 Text Summarization Barbara Rosario

Outline • Introduction and Applications • Types of summarization tasks • Approaches and paradigms • Evaluation methods • Acknowledgments • Slides inspired (and taken) from • Automatic Summarizationby Inderjeet Mani • www.isi.edu/~marcu/acl-tutorial.ppt (Hovy and Marcu) • http://summarization.com/ • http://www.summarization.com/sigirtutorial2004.ppt (Radez)

Introduction • The problem – Information overload • 4 Billion URLs indexed by Google • 200 TB of data on the Web [Lyman and Varian 03] • Information is created every day in enormous amounts • One solution – summarization • The goal of an automatic summarization is to take an information source, extract content from it and present the most important content to the user in a condensed form and in a manner sensitive to the user’s or application’s needs • Other solutions are QA, information extraction, IR, indexing, document clustering, visualization

Applications • Abstracts for scientific and other articles • News summarization (mostly multiple document summarization) • Multimedia news summaries (“watch the news and tell me what happened while I was away”) • Web pages for search engines • Hand-held devices • Question answering and data/intelligence gathering • Physicians’ aid • Provide physicians with summaries of on-line medical literature related to a patient’s medical record • Meetings summarization • Aid for the handicapped • Summarization for reading machine for the blind.

Current applications • General purpose commercial summarization tools: • AutoSummarize MS Word • InXight Summarizer

MSWord AutoSummarize

Human summarization and abstracting • What professional abstractors do • “To take an original article, understand it and pack it neatly into a nutshell without loss of substance or clarity presents a challenge which many have felt worth taking up for the joys of achievement alone. These are the characteristics of an art form”. (Ashworth)

Human summarization and abstracting • The abstract and its use: • To promote current awareness • To save reading time • To facilitate selection • To facilitate literature searches • To improve indexing efficiency • To aid in the preparation of reviews

American National Standard for Writing Abstracts • State the purpose, methods, results, and conclusions presented in the original document, either in that order or with an initial emphasis on results and conclusions. • Make the abstract as informative as the nature of the document will permit, so that readers may decide, quickly and accurately, whether they need to read the entire document. • Avoid including background information or citing the work of others in the abstract, unless the study is a replication or evaluation of their work. Cremmins 82, 96

American National Standard for Writing Abstracts • Do not include information in the abstract that is not contained in the textual material being abstracted. • Verify that all quantitative and qualitative information used in the abstract agrees with the information contained in the full text of the document. • Use standard English and precise technical terms, and follow conventional grammar and punctuation rules. • Give expanded versions of lesser known abbreviations and acronyms, and verbalize symbols that may be unfamiliar to readers of the abstract. • Omit needless words, phrases, and sentences. Cremmins 82, 96

Types of Summaries • Indicative vs. Informative vs. Critical • Indicative: give an idea of what is there, provides a reference function for selecting documents for more in-depth reading • Informative: a substitute for the entire document, covers all the salient information in the source at some level of detail • Critical: evaluates the subject matter of the source, expressing the abstractor’s view on the quality of the work of the author

Types of Summaries (cont.) • Input: single document vs. multi-document (MDS) • MDS: what’s common across documents or different in a particular one • Input: media types (text, audio, table, pictures, diagrams) • Output: media types (text, audio, table, pictures, diagrams)

Types of Summaries (cont.) • Output: Extract vs.Abstract • Extract: summary consisting entirely of material copied from the input • Abstract: summary where some material is not present in the input • Paraphrase, generation • Research shows that sometimes readers prefer extracts

Types of Summaries (cont.) • Output: User-focused (or topic-focused or query focused): summaries that are tailored to the requirements of a particular user or group of users • Background • Does the reader have the needed prior knowledge? Expert reader vs. Novice reader • General: summaries aimed at a particular –usually broad –readership community

Types of Summaries (cont.) • Output: • language chosen for summarization • format of the resulting summary (table/paragraph/key words/documents with different sections and headings)

Parameters • Compression rate (summary length/source length) • Audience (user-focused vs. generic) • Relation to source (extract vs. abstract) • Function (indicative vs. informative vs. critical) • Coherence: the way the parts of the text gather together to form an integrated whole • Coherent vs. incoherent • Incoherent: unresolved anaphors, gaps in the reasoning, sentences which repeat the same or similar meaning (redundancy) a lack of organization

Parameters (cont.) • Span (single or MDS) • Language • monolingual, multi-lingual, cross-lingual, sub-languages (technical, tourism) • Media • Genres • Headlines, minutes, biographies, movie summaries, chronologies, etc.

Process • Three phases (typically) • Analysis – content identification • Analyze the input, build an internal representation of it • Can be done at different levels • Morphology, syntax, semantics, discourse • And looking at different elements • Sub-word, word, phrase, sentence, paragraph, document • Transformation (or refinement) -- conceptual organization • Transform the internal representation into a summary representation (mostly for abstracts or MDS) • Synthesis (Realization) • Summary representation is rendered into natural language

Summarization approaches • Shallow approaches • Syntactic level at most • Typically produce extracts • Extract salient parts of the source text and then arrange and present them in some effective manner • Deeper approaches • Sentential semantic level • Produce abstracts and the synthesis phase involves natural language generation. • Knowledge-intensive, may require some domain specific coding • Example: generation summaries of basketball statistics or stock markets bulletins

Outline • Introduction and Applications • Types of summarization tasks • Approaches and paradigms • Evaluation methods

Overview of Extraction Methods • General method: • score each entity (sentence, word) ; combine scores; choose best sentence(s) • Word frequencies throughout the text • Position in the text • lead method; optimal position policy • title/heading method • Cue phrases in sentences • Cohesion: links among words • word co-occurrence • coreference • lexical chains • Discourse structure of the text • Information Extraction: parsing and analysis

Using Word Frequencies • Luhn 58: Very first work in automated summarization • Assumptions: • Frequent words indicate the topic • Frequent means with reference to the corpus frequency • Clusters of frequent words indicate summarizing sentence • Stemming based on similar prefix characters • Very common words and very rare words are ignored • Evaluation: straightforward approach empirically shown to be mostly detrimental in summarization systems.

Ranked Word Frequency Zipf’s curve Resolving power of significant words

Position in the text • Claim : Important sentences occur in specific positions • “lead-based” summary • just take first sentence(s)! • Important information occurs in specific sections of the document (introduction/conclusion) • Experiments: • In 85% of 200 individual paragraphs the topic sentences occurred in initial position and in 7% in final position

Title method • Claim : title of document indicates its content • Unless editors are being cute • Not true for novels usually • What about blogs …? • Words in title help find relevant content • Create a list of title words, remove “stop words” • Use those as keywords in order to find important sentences

Optimum Position Policy (OPP) • Claim: Important sentences are located at positions that are genre-dependent; these positions can be determined automatically through training • Corpus: 13000 newspaper articles (ZIFF corpus). • Step 1: For each article, determine overlap between sentences and the index terms for the article. • Step 2: Determine a partial ordering over the locations where sentences containing important words occur: Optimal Position Policy (OPP) • (Some recent work looked at the use of citation sentences.)

Cue phrases method • Claim : Important sentences contain cue words/indicative phrases • “The main aim of the present paper is to describe…” • “The purpose of this article is to review…” • “In this report, we outline…” • “Our investigation has shown that…” • Some words are considered bonus others stigma • bonus: comparatives, superlatives, conclusive expressions, etc. • stigma: negatives, pronouns, etc. non-important sentences contain ‘stigma phrases’ such as hardly and impossible. • These phrases can be detected automatically • Method: Add to sentence score if it contains a bonus phrase, penalize if it contains a stigma phrase.

Bayesian Classifier • Statistical learning method • Corpus • 188 document + summary pairs from scientific journals

Bayesian Classifier • For each sentence s in the documents • extract features • Fixed-phrase feature • Certain phrases indicate summary, e.g. “in summary” • Paragraph feature • Paragraph initial/final more likely to be important. • Thematic word feature • Repetition is an indicator of importance • Uppercase word feature • Uppercase often indicates named entities. (Taylor) • Sentence length cut-off • Summary sentence should be > 5 words. • Calculate probability of the sentence s being in the summary

Bayesian Classifier: Training • Hand-label sentences in training set (good/bad summary sentences) • Train classifier to distinguish good/bad summary sentences • Model used: Naïve Bayes • Can rank sentences according to score and show top n to user.

Probability of feature-value pair occurring in a source sentence which is also in the summary Probability that sentence s is included in summary S, given that sentence’s feature value pairs Details: Bayesian Classifier • Assuming statistical independence:

Bayesian Classifier • Each Probability is calculated empirically from a corpus • See how often each feature is seen with a sentence selected for a summary, vs. how often that feature is seen in any sentence. • Higher probability sentences are chosen to be in the summary • Performance: • For 25% summaries, 84% precision

MaxEntropy model • Maxent model – no independence assumptions • Features: word pairs, sentence length, sentence position, discourse features (e.g., whether sentence follows the “Introduction”, etc.) • MaxEnt outperforms Naïve Bayes

Cohesion-based methods • Claim: Important sentences/paragraphs are the highest connected entities in more or less elaborate semantic structures. • Classes of approaches • word co-occurrences; • local salience and grammatical relations; • co-reference; • lexical similarity (WordNet, lexical chains); • combinations of the above.

P2 P1 P3 P9 P4 P8 P5 P7 P6 Cohesion: word co-occurrence • Apply IR methods at the document level: texts are collections of paragraphs • Use a traditional, IR-based, word similarity measure to determine for each paragraph Pi the set Si of paragraphs that Pi is related to. • Method: • determine relatedness score Si for each paragraph, • extract paragraphs with largest Si scores.

Combining the Evidence • Problem: which extraction methods to believe? • Answer: assume they are independent, and combine their evidence: merge individual sentence scores.

Information extraction methods • Idea: content selection using templates • Predefine a template, whose slots specify what is of interest. • Use a canonical IE system to extract from a (set of) document(s) the relevant information; fill the template. • Generate the content of the template as the summary.

Information Extraction method • Example template: MESSAGE:ID TSL-COL-0001 SECSOURCE:SOURCE Reuters SECSOURCE:DATE 26 Feb 93 Early afternoon INCIDENT:DATE 26 Feb 93 INCIDENT:LOCATION World Trade Center INCIDENT:TYPE Bombing HUM TGT:NUMBER AT LEAST 5

MESSAGE: ID TST3-MUC4-0010 MESSAGE: TEMPLATE 2 INCIDENT: DATE 30 OCT 89 INCIDENT: LOCATION EL SALVADOR INCIDENT: TYPE ATTACK INCIDENT: STAGE OF EXECUTION ACCOMPLISHED INCIDENT: INSTRUMENT ID INCIDENT: INSTRUMENT TYPEPERP: INCIDENT CATEGORY TERRORIST ACT PERP: INDIVIDUAL ID "TERRORIST" PERP: ORGANIZATION ID "THE FMLN" PERP: ORG. CONFIDENCE REPORTED: "THE FMLN" PHYS TGT: ID PHYS TGT: TYPEPHYS TGT: NUMBERPHYS TGT: FOREIGN NATIONPHYS TGT: EFFECT OF INCIDENTPHYS TGT: TOTAL NUMBERHUM TGT: NAMEHUM TGT: DESCRIPTION "1 CIVILIAN"HUM TGT: TYPE CIVILIAN: "1 CIVILIAN"HUM TGT: NUMBER 1: "1 CIVILIAN"HUM TGT: FOREIGN NATIONHUM TGT: EFFECT OF INCIDENT DEATH: "1 CIVILIAN"HUM TGT: TOTAL NUMBER Information Extraction method • Knowledge-rich approaches

On October 30, 1989, one civilian was killed in a reported FMLN attack in El Salvador. Generation • Generating text from templates

Input: Cluster of templates ….. T1 T2 Tm Conceptual combiner Combiner Domainontology Planningoperators Paragraph planner Linguistic realizer Sentence planner Lexicon Lexical chooser Sentence generator SURGE OUTPUT: Base summary

Operators for generation • If there are two templates ANDthe location is the same ANDthe time of the second template is after the time of the first template ANDthe source of the first template is different from the source of the second template ANDat least one slot differs THENcombine the templates using the contradiction operator...

Operators: Contradiction Contradiction Precondition:Different sources report contradictory values for a small number of slots The afternoon of February 26, 1993, Reuters reported that a suspected bomb killed at least six people in the World Trade Center. However, Associated Press announced that exactly five people were killed in the blast. Other operators are: refinement, agreement…

1 2 3 4 Excerpts from four articles JERUSALEM - A Muslim suicide bomber blew apart 18 people on a Jerusalem bus and wounded 10 in a mirror-image of an attack one week ago. The carnage could rob Israel's Prime Minister Shimon Peres of the May 29 election victory he needs to pursue Middle East peacemaking. Peres declared all-out war on Hamas but his tough talk did little to impress stunned residents of Jerusalem who said the election would turn on the issue of personal security. JERUSALEM - A bomb at a busy Tel Aviv shopping mall killed at least 10 people and wounded 30, Israel radio said quoting police. Army radio said the blast was apparently caused by a suicide bomber. Police said there were many wounded. A bomb blast ripped through the commercial heart of Tel Aviv Monday, killing at least 13 people and wounding more than 100. Israeli police say an Islamic suicide bomber blew himself up outside a crowded shopping mall. It was the fourth deadly bombing in Israel in nine days. The Islamic fundamentalist group Hamas claimed responsibility for the attacks, which have killed at least 54 people. Hamas is intent on stopping the Middle East peace process. President Clinton joined the voices of international condemnation after the latest attack. He said the ``forces of terror shall not triumph'' over peacemaking efforts. TEL AVIV (Reuter) - A Muslim suicide bomber killed at least 12 people and wounded 105, including children, outside a crowded Tel Aviv shopping mall Monday, police said. Sunday, a Hamas suicide bomber killed 18 people on a Jerusalem bus. Hamas has now killed at least 56 people in four attacks in nine days. The windows of stores lining both sides of Dizengoff Street were shattered, the charred skeletons of cars lay in the street, the sidewalks were strewn with blood. The last attack on Dizengoff was in October 1994 when a Hamas suicide bomber killed 22 people on a bus.

1 2 3 4 Four templates MESSAGE: ID TST-REU-0001 SECSOURCE: SOURCE Reuters SECSOURCE: DATE March 3, 1996 11:30 PRIMSOURCE: SOURCE INCIDENT: DATE March 3, 1996 INCIDENT: LOCATION Jerusalem INCIDENT: TYPE Bombing HUM TGT: NUMBER “killed: 18''“wounded: 10” PERP: ORGANIZATION ID MESSAGE: ID TST-REU-0002 SECSOURCE: SOURCE Reuters SECSOURCE: DATE March 4, 1996 07:20 PRIMSOURCE: SOURCE Israel Radio INCIDENT: DATE March 4, 1996 INCIDENT: LOCATION Tel Aviv INCIDENT: TYPE Bombing HUM TGT: NUMBER “killed: at least 10''“wounded: more than 100” PERP: ORGANIZATION ID MESSAGE: ID TST-REU-0003 SECSOURCE: SOURCE Reuters SECSOURCE: DATE March 4, 1996 14:20 PRIMSOURCE: SOURCE INCIDENT: DATE March 4, 1996 INCIDENT: LOCATION Tel Aviv INCIDENT: TYPE Bombing HUM TGT: NUMBER “killed: at least 13''“wounded: more than 100” PERP: ORGANIZATION ID “Hamas” MESSAGE: ID TST-REU-0004 SECSOURCE: SOURCE Reuters SECSOURCE: DATE March 4, 1996 14:30 PRIMSOURCE: SOURCE INCIDENT: DATE March 4, 1996 INCIDENT: LOCATION Tel Aviv INCIDENT: TYPE Bombing HUM TGT: NUMBER “killed: at least 12''“wounded: 105” PERP: ORGANIZATION ID

Fluent summary with comparisons Reuters reported that 18 people were killed on Sunday in a bombing in Jerusalem. The next day, a bomb in Tel Aviv killed at least 10 people and wounded 30 according to Israel radio. Reuters reported that at least 12 people were killed and 105 wounded in the second incident. Later the same day, Reuters reported that Hamas has claimed responsibility for the act.

Outline • Introduction and Applications • Types of summarization tasks • Approaches and paradigms (for single documents summarization) • Evaluation methods

Evaluation methods • When a manual summary is available: • Choose a granularity (clause; sentence; paragraph) • Create a similarity measure for that granularity (word overlap; multi-word overlap, perfect match) • Measure the similarity of each unit in the new to the most similar unit(s) • Measure Recall and Precision.

Evaluation methods • When a manual summary is NOT available: • Intrinsic –how good is the summary as a summary? • Extrinsic – how well does the summary help the user?

Intrinsic measures • Intrinsic measures: how good is the summary as a summary? • Problem: how do you measure the goodness of a summary? • Studies: compare to ideal or supply criteria—fluency, quality, informativeness, coverage, etc. • Summary evaluated on its own or comparing it with the source • Is the text cohesive and coherent? • Does it contain the main topics of the document? • Are important topics omitted?

I256 Applied Natural Language Processing Fall 2009