Automatic Summarization:
This presentation is the property of its rightful owner.
Sponsored Links
1 / 104

Automatic Summarization: A Tutorial Presented at RANLP’2003 Inderjeet Mani Georgetown University PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

Automatic Summarization: A Tutorial Presented at RANLP’2003 Inderjeet Mani Georgetown University Tuesday, September 9, 2003 2-5:30 pm @georgetown.edu complingone.georgetown.edu/~linguist/inderjeet.html. AGENDA.

Download Presentation

Automatic Summarization: A Tutorial Presented at RANLP’2003 Inderjeet Mani Georgetown University

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Automatic summarization a tutorial presented at ranlp 2003 inderjeet mani georgetown university

Automatic Summarization:

A Tutorial Presented at RANLP’2003

Inderjeet Mani

Georgetown University

Tuesday, September 9, 2003

2-5:30 pm

@georgetown.edu

complingone.georgetown.edu/~linguist/inderjeet.html


Agenda

AGENDA

  • 14:10 pmI. Fundamentals (Definitions, Human Abstracting, Abstract Architecture)

  • 14:40 II. Extraction(Shallow Features, Revision,

  • Corpus-Based Methods)

  • 15:30 Break

  • 16: 00III. Abstraction (Template and Concept-Based)

  • 16:30 IV. Evaluation

  • 17:00 pmV. Research Areas

  • Multi-document, Multimedia, Multilingual

  • Summarization

  • 17:30 pmConclusion


Human summarization is all around us

Human Summarization is all around us

  • Headlinesnewspapers, Headline News

  • Table of contentsof a book, magazine, etc.

  • Previewof a movie

  • DigestTV or cinema guide

  • Highlightsmeeting dialogue, email traffic

  • Abstractsummary of a scientific paper

  • Bulletinweather forecast, stock market, ...

  • Biographyresume, obituary, tombstone

  • AbridgmentShakespeare for kids

  • Reviewof a book, a CD, play, etc.

  • Scale-downsmaps, thumbnails

  • Sound bite/video clipfrom speech, conversation, trial


Current applications

Current Applications

  • Multimedia news summaries: watch the news and tell me what happened while I was away

  • Physicians' aids: summarize and compare the recommended treatments for this patient

  • Meeting summarization: find out what happened at that teleconference I missed

  • Search engine hits: summarize the information in hit lists retrieved by search engines

  • Intelligence gathering: create a 500-word biography of Osama bin Laden

  • Hand-held devices: create a screen-sized summary of a book

  • Aids for the Handicapped: compact the text and read it out for a blind person


Example biogen biographies

Example BIOGEN Biographies

Vernon Jordan is a presidential friend and a Clinton adviser. He is 63 years old. He helped Ms. Lewinsky find a job. Hetestified that Ms. Monica Lewinsky said that she had conversations with the president, that she talked to the president.He has numerous acquaintances, including Susan Collins, Betty Currie, Pete Domenici, Bob Graham, James Jeffords and Linda Tripp.

Henry Hyde is a Republican chairman of House Judiciary Committee and a prosecutor in Senate impeachment trial. He will lead the Judiciary Committee's impeachment review.Hyde urged his colleagues to heed their consciences , “the voice that whispers in our ear , ‘duty, duty, duty.’”

.

Victor Polay is the Tupac Amaru rebels' top leader, founder and the organization's commander-and-chief. He was arrested again in 1992 and is serving a life sentence. His associates include Alberto Fujimori, Tupac Amaru Revolutionary, and Nestor Cerpa.


Columbia university s newsblaster

Columbia University’s Newsblaster

www.cs.columbia.edu/nlp/newsblaster/summaries/11_03_02_5.html


Michigan s mead

Michigan’s MEAD


Terms and definitions

Terms and Definitions

  • Text Summarization

    • The process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks).

  • Extract vs. Abstract

    • An extract is a summary consisting entirely of material copied from the input

    • An abstract is a summary at least some of whose material is not present in the input, e.g., subject categories, paraphrase of content, etc.


Illustration of extracts and abstracts

Illustration of Extracts and Abstracts

25 Percent Extract of Gettysburg Address (sents 1, 2, 6)

  • Fourscore and seven years ago our fathers brought forth upon this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. The brave men, living and dead, who struggled here, have consecrated it far above our poor power to add or detract.

    10 Percent Extract (sent 2}

  • Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure.

    15 Percent Abstract

  • This speech by Abraham Lincoln commemorates soldiers who laid down their lives in the Battle of Gettysburg. It offers an eloquent reminder to the troops that it is the future of freedom in America that they are fighting for.


Illustration of the power of human abstracts

Illustration of the power of human abstracts

  • Mrs. Coolidge: What did the preacher discuss in his sermon?

  • President Coolidge: Sin.

  • Mrs. Coolidge: What did he say?

  • President Coolidge: He said he was against it.

    • - Bartlett’s Quotations (via Graeme Hirst)

President Calvin Coolidge, Grace Coolidge, and dog, Rob Roy, c.1925. Plymouth Notch, Vermont.


Summary function

Indicative

Summary Function

Informative

evaluative

  • Indicative summaries

    • An indicative abstract provides a reference function for selecting documents for more in-depth reading.

  • Informative summaries

    • An informative abstract covers all the salient information in the source at some level of detail.

  • Evaluative summaries

    • A critical abstract evaluates the subject matter of the source, expressing the abstractor's views on the quality of the work of the author

The indicative/informative distinction is a prescriptive distinction, intended to guide professional abstractors (e.g., ANSI 1996).


User oriented summary types

User-Oriented Summary Types

  • Generic summaries

    • aimed at a particular - usually broad - readership community

  • Tailored summaries (aka user-focused, topic-focused, query-focused summaries)

    • tailored to the requirements of a particular user or group of users.

    • User’s interests:

      • full-blown user models

      • profiles recording subject area terms

      • a specific query.

    • A user-focused summary needs, of course, to take into account the influence of the user as well as the content of the document.

      • A user-focused summarizer usually includes a parameter to influence this weighting.


Summarization architecture

Summarization Architecture

Compression

Audience

Function

Coherence

Type

Extract

Abstract

Summaries

Characteristics

Span

Source

Genre

Media

Language

Analysis

Transformation

Synthesis


Characteristics of summaries

Characteristics of Summaries

  • Reduction of information content

    • Compression Rate, also known as condensation rate, reduction rate

      • Measured by summary length / source length ( 0 < c < 100)

    • Target Length

  • Informativeness

    • Fidelity to Source

    • Relevance to User’s Interests

  • Well-formedness/Coherence

    • Syntactic and discourse-level

      • Extracts: need to avoid gaps, dangling anaphors, ravaged tables, lists, etc.

      • Abstracts: need to produce grammatical, plausible output


Relation of summarization to other tasks

Relation of Summarization to Other Tasks


One text many summaries evaluation preview

One Text, Many Summaries(Evaluation preview)

25 Percent Leading Text Extract (first 3 sentences) - seems OK, too!

Four score and seven years ago our fathers brought forth upon this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met here on a great battlefield of that war.

15 Percent Synopsis by human (critical summary) - seems even better!

This speech by Abraham Lincoln commemorates soldiers who laid down their lives in the Battle of Gettysburg. It offers an eloquent reminder to the troops that it is the future of freedom in America that they are fighting for.

11 Percent Extract (by human, out of context) - is bad! (sents5, 8)

It is altogether fitting and proper that we should do this. The world will little note, nor long remember, what we say here, but can never forget what they did here.

We can usually tell when a summary is incoherent, but how do we evaluate summaries in general?


Studies of human summaries

Studies of human summaries

  • Cremmins (1996) prescribed that abstractors

    • use surface features: headings, key phrases, position

    • use discourse features: overall text structure

    • revise and edit abstracts

  • Liddy (1991)

    • studied 276 abstracts structured in terms of background, purpose, methodology, results and conclusions

  • Endres-Niggemeyer et al. (1995, 1998) found abstractors

    • use top-down strategy exploiting discourse structure

    • build topic sentences, use beginning/ends as relevant, prefer top level segments, examine passages/paragraphs before individual sentences, exploit outlines, formatting ...


Endres niggemeyer et al 1995 1998

Endres-Niggemeyer et al. (1995, 1998)

  • Abstractors never attempt to read the document from start to finish.

  • Instead, they use the structural organization of the document, including formatting and layout (the scheme) to skim the document for relevant passages, which are fitted together into a discourse-level representation (the theme).

  • This representation uses discourse-level rhetorical relations to link relevant text elements capturing what the document is about.

  • They use a top-down strategy, exploiting document structure, and examining paragraphs and passages before individual sentences.

  • The skimming for relevant passages exploits specific shallowfeatures such as:

    • cue phrases (especially in-text summaries)

    • location of information in particular structural positions (beginning of the document, beginning and end of paragraphs)

    • information from the title and headings.


Stages of abstracting cremmins 1996

Stages of Abstracting: Cremmins (1996)

Cremmins recommends 12-20 mins to abstract an average scientific paper - much less time than it takes to really understand one.


Abstractors editing operations local revision

drop vague or

redundant terms

wording

prescriptions

contextual

lexical choice

reference

adjustment

Abstractors’ Editing Operations: Local Revision

  • Cremmins (1996) described two kinds of editing operations that abstractors carry out

    • Local Revision - revises content within a sentence

    • Global Revision - revises content across sentences


Agenda1

AGENDA

  • 14:10 pmI. Fundamentals (Definitions, Human Abstracting, Abstract Architecture)

  • 14:40 II. Extraction(Shallow Features, Revision,

  • Corpus-Based Methods)

  • 15:30 Break

  • 16: 00III. Abstraction (Template and Concept-Based)

  • 16:30 IV. Evaluation

  • 17:00 pmV. Research Areas

  • Multi-document, Multimedia, Multilingual

  • Summarization

  • 17:30 pmConclusion


Summarization approaches

Shallower approaches

result in sentence extraction

sentences may/will be extracted out of context

synthesis here involves smoothing

include window of previous sentences

adjust references

can be trained using a corpus

Deeper approaches

result in abstracts

synthesis involves NL generation

can be partly trained using a corpus

requires some coding for a domain

Summarization Approaches


Some features used in sentence extraction summaries

Some Features used in Sentence Extraction Summaries

  • Location: position of term in document, position in paragraph/section, section depth, particular sections (e.g., title, introduction, conclusion)

  • Thematic: presence of statistically salient terms (tf.idf)

    • these are document-specific

  • Fixed phrases: in-text summary cue phrases (“in summary”, “our investigation shows”, “the purpose of this article is”,..), emphasizers (“important”, “in particular”,...)

    • these are genre-specific

  • Cohesion: connectivity of text units based on proximity, repetition and synonymy, coreference, vocabulary overlap

  • Discourse Structure: rhetorical structure, topic structure, document format


Putting it together linear feature combination

Putting it Together: Linear Feature Combination

U is a text unit such as a sentence, Greek letters denote tuning parameters

  • LocationWeight assigned to a text unit based on whether it occurs in initial, medial, or final position in a paragraph or the entire document, or whether it occurs in prominent sections such as the document’s intro or conclusion

  • FixedPhraseWeight assigned to a text unit in case fixed-phrase summary cues occur

  • ThematicTermWeight assigned to a text unit due to the presence of thematic terms (e.g., tf.idf terms) in that unit

  • AddTermWeight assigned to a text unit for terms in it that are also present in the title, headline, initial para, or the user’s profile or query


Shallow approaches

Shallow Approaches

Synthesis

(Smoothing)

Analysis

Transformation(Selection)

Feature

Extractor

Sentence

Selector

Feature

Combiner

aF1+bF2+gF3

Feature

Extractor

Sentence

Revisor

Source(s)

Summary

Feature

Extractor


Revision as repair

Revision as Repair

  • structured environments (tables, etc.)

    • recognize and exclude

    • **recognize and summarize

  • anaphors

    • exclude sentences (which begin) with anaphors

    • include a window of previous sentences

    • **reference adjustment

  • gaps

    • include low-ranked sentences immediately between two selected sentences

    • add first sentence of para if second or third selected

    • **model rhetorical structure of source


A simple text revision algorithm

A Simple Text Revision Algorithm

  • Construct initial “sentence-extraction” draft from source by picking highest weighted sentences in source until compression target is reached

  • Revise draft

    • Use syntactic trees (using a statistical parser) augmented with coreference classes

      1 Procedure Revise(draft, non-draft, rules, target-compression):

      2 for each rule in rules

      3 while ((compression(draft)- target-compression) < d)

      4 while (<x, y> := next-candidates(draft, non-draft)) # e.g., binary rule

      5 result := apply-rule(rule, x, y); # returns first result which succeeds

      6 draft := draft U result


Example of sentence revision

Deleted

Salient

Aggregated

Example of Sentence Revision


Informativeness vs coherence in sentence revision

Informativeness vs. Coherence in Sentence Revision

Mani, Gates, and Bloedorn (ACL’99): 630 summaries from 7 systems (of 90 documents) were revised and evaluated using vocabulary overlap measure against TIPSTER answer keys.

A: Aggregation, E: Elimination

> is good

A > I, A+E > I (initial draft)

A >* E, A+E >* E

Informativeness

Sentence Complexity

< is good

A+E <* I

A >* I


Corpus based sentence extraction

CORPUS-BASEDSENTENCE EXTRACTION


The need for corpus based sentence extraction

The Need for Corpus-Based Sentence Extraction

  • Importance of particular features can vary with the genre of text

    • e.g., location features:

      • newspaper stories: leading text

      • scientific text: conclusion

      • TV news: previews

  • So, there is a need for summarization techniques that are adaptive, that can be trained for different genres of text


Learning sentence extraction rules

Learning Sentence Extraction Rules

Few corpora available; labeling can be non-trivial, requiring aligning each document unit (e.g., sentence) with abstract.

Learns to extract just individual sentences (though feature vectors can include contextual features).


Example1 kupiec et al 1995

Example1: Kupiec et al. (1995)

  • Input

    • Uses a corpus of 188 full-text/abstract pairs drawn from 21 different scientific collections

    • Professionally written abstracts 3 sentences long on the average

    • The algorithm takes each sentence and computes a probability that it should be included in a summary, based on how similar it is to the abstract

      • Uses Bayesian classifier

  • Result

    • About 87% (498) of all abstract sentences (568) could be matched to sentences in the source (79% direct matches, 3% direct joins, 5% incomplete joins)

    • Location was best feature at 163/498 = 33%

    • Para+fixed-phrase+sentence length cutoff gave best sentence recall performance … 217/498=44%

    • At compression rate = 25% (20 sentences), performance peaked at 84% sentence recall


Example 2 mani bloedorn 1998

Example 2: Mani & Bloedorn (1998)

  • cmp-lg corpus (xxx.lanl.gov/cmp-lg) of scientific texts, prepared in SGML form by Simone Teufel at U. Edinburgh

  • 198 pairs of full-text sources and author-supplied abstracts

  • Full-text sources vary in size from 4 to 10 pages, dating from 1994-6

  • SGML tags include: paragraph, title, category, summary, headings and heading depth (figures, captions and tables have been removed)

  • Abstract length averages about 5% (avg. 4.7 sentences) of source length

  • Processing

    • Each sentence in full-text source converted to feature vector

    • 27,803 feature-vectors (reduces to 903 unique vectors)

    • Generated generic and user focused summaries


Comparison of learning algorithms

Comparison of Learning Algorithms

20% compression, 10 fold cv

Generic

User-focused


Example rules

Example Rules

  • Generic summary rule, generated by C4.5Rules (20% compression)

    If sentence is in the conclusion and it is a high tf.idf sentence

    Then it is a summary sentence

  • User-focused rules, generated by AQ (20% compression)

    If the sentence includes 15..20 keywords* present

    Then it is a summary sentence (163 total, 130 unique)

    If the sentence is in the middle third of the paragraph and the paragraph is in the first third of the section

    Then it is a summary sentence (110 total, 27 unique)

    *keywords - terms occurring in sentences ranked as highly-relevant to query (abstract)


Issues in learning sentence extraction rules

Issues in Learning Sentence Extraction Rules

  • Choice of corpus

    • size of corpus

    • availability of abstracts/extracts/judgments

    • quality of abstracts/extracts/judgments

      • compression, representativeness, coherence, language, etc.

  • Choice of labeler to label a sentence as summary-worthy or not based on a comparison between the source document sentence and the document's summary.

    • Label a source sentence (number) as summary worthy if it found in the extract

    • Compare summary sentence content with source sentence content (labeling by content similarity – L/CS)

    • Create an extract from an abstract (e.g., by alignment L/A->E )

  • Feature Representation, Learning Algorithm, Scoring


L cs in kpc

L/CS in KPC

  • To determine ifsÎE, they use a content-based match (since the summaries don’t always lift sentences from the full-text).

  • They match the source sentence to each sentence in the abstract. Two varieties of matches:

    • Direct sentence match:

      • the summary sentence and source text sentence are identical or can be considered to have the same content. (79% of matches)

    • Direct join:

      • two or more sentences from the source text (called joins) appear to have the same content as a single summary sentence. (3% of matched)


L cs in mb98 generic summaries

L/CS in MB98: Generic Summaries

  • For each source text

    • Represent abstract (list of sentences)

    • Match source text sentences against abstract, giving a ranking for source sentences (ie, abstract as “query”)

      • combined-match: compare source sentence against entire abstract (similarity based on content-word overlap + weight)

      • individual-match: compare source sentence against each sentence of abstract (similarity based on longest string match to any abstract sentence)

    • Label top C% of the matched source sentences’ vectors as positive

      • C (Compression) = 5,10,15,20,25

        • e.g., C=10 => for a 100-sentence source text, 10 sentences will be labeled positive


L a e in jing et al 98

L/A->E in Jing et al. 98

f1

Abstract

Source

w1

w2

f2

Find the fr which maximizes P(fr(w1…wn))

i.e., using Markov Assumption

P(fr(w1….wn)) i=1,n P(fr(wi)|fr(wi-1))


Sentence extraction as bayesian classification

Sentence Extraction as Bayesian Classification

P(sÎE | F1,…, Fn) = Õj=1,nP(Fj|sÎE) P(sÎE) / Õj=1,nP(Fj)

P(sÎE) - compression rate c

P(sÎE | F1,…, Fn) - probability that sentence s is included in extract E, given the sentence’s feature-value pairs

P(Fj) - probability of feature-value pair occurring in a source sentence

P(Fj|sÎE) - probability of feature -value pair occurring in a source sentence which is also in the extract

The features are discretized into Boolean features, to simplify matters


Adding discourse level features to the mix

ADDING DISCOURSE-LEVEL FEATURES TO THE MIX


Cohesion

Cohesion

  • There are links in text, called ties, which express semantic relationships

  • Two classes of relationships:

    • Grammatical cohesion

      • anaphora

      • ellipsis

      • conjunction

    • Lexical cohesion

      • synonymy

      • hypernymy

      • repetition


Martian weather with grammatical and lexical cohesion relations

Martian Weather with Grammatical and Lexical Cohesion Relations

Withitsdistant orbit ­­­ 50 percent farther from the sun than Earth ­­­ and slim atmospheric blanket, Mars experiences frigidweather conditions. Surface temperatures typically average about ­60 degrees Celsius (­76 degrees Fahrenheit) at the equatorand […]can dip to ­123 degrees C near the poles. Only the midday sun at tropical latitudes is warm enough to thawice on occasion, but any liquid water formed in this way would evaporate almost instantly because of the low atmospheric pressure. Althoughthe atmosphere holds a small amount of water, and water­ice clouds sometimes develop, most Martianweatherinvolves blowing dust or carbon dioxide. Each winter, for example, a blizzard of frozencarbon dioxide rages over one pole, and a few meters of this dry­icesnow accumulate as previously frozencarbon dioxide evaporates from the opposite polar cap. Yet even on thesummerpole, where the sun remains in the sky all day long, temperatures never warm enough to meltfrozenwater.


Text graphs based on cohesion

Text Graphs based on Cohesion

  • Represent a text as a graph

  • Nodes: words (or sentences)

  • Links: Cohesion links between nodes

  • Graph Connectivity Assumption:

    • More highly connected nodes are likely to carry salient information.


Cohesion based graphs

chain

ring

monolith

piecewise

Cohesion based Graphs

1

2

3

Link between nodes > 5 apart ignored

Best 30p links at density 2.00, seg_csim 0.26

P5

P9

P8

P10

P7

P5

P12

Facts about an issue

P3

P13

P15

Legality of an issue

P24

P16

P23

P18

P19

P21

Node: Sentence

Link: RelatedP

Method: node centrality and topology

Node: Paragraph

Link: Cosine Similarity

Method: Local segmentation then node centrality

Node: Words/Phrases

Link: Lexical/Grammatical Cohesion

Method: node centrality discovered by spreading activation (see also clustering using lexical chains)

Skorochodhko 1972

Salton et al. 1994

Mani & Bloedorn 1997


Coherence

Coherence

  • Coherence is the modeling of discourse relations using different sources of evidence, e.g.,

    • Document format

      • layout in terms of sections, chapters, etc.

      • page layout

    • Topic structure

      • TextTiling (Hearst)

    • Rhetorical structure

      • RST (Mann & Mathiessen)

      • Text Grammars (vanDijk, Longacre)

      • Genre-specific rhetorical structures (Methodology, Results, Evaluation, etc.) (Liddy , Swales, Teufel & Moens, Saggion & Lapalme, etc.)

    • Narrative structure


Using a coherence based discourse model in summarization

Using a Coherence-based Discourse Model in Summarization

  • Choose a theory of discourse structure

  • Parse text into a labeled tree of discourse segments, whose leaves are sentences or clauses

    • Leaves typically need not have associated semantics

  • Weight nodes in tree, based on node promotion and clause prominence

  • Select leaves based on weight

  • Print out selected leaves for summary synthesis


Martian weather summarized using marcu s algorithm target length 4 sentences

Martian Weather Summarized Using Marcu’s Algorithm (target length = 4 sentences)

[With its distant orbit {– 50 percent farther from the sun than Earth –} and slim atmospheric blanket,1] [Mars experiences frigid weather conditions.2] [Surface temperatures typically average about –60 degrees Celsius (–76 degrees Fahrenheit) at the equator and can dip to –123 degrees C near the poles.3] [Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion,4] [but any liquid water formed that way would evaporate almost instantly5] [because of the low atmospheric pressure.6] [Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop,7] [most Martian weather involves blowing dust or carbon dioxide.8] [Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap.9] [Yet even on the summer pole, {where the sun remains in the sky all day long,} temperatures never warm enough to melt frozen water.10]

2 > 8 > {3, 10} > {1, 4, 5, 7, 9}


Illustration of node promotion marcu

Illustration of Node Promotion (Marcu)

Nodes: Relations

Leaves: Clauses

Nucleus: square boxesSatellite: dotted boxes


Detailed evaluation of marcu s method

Detailed Evaluation of Marcu’s Method

Recall Precision Size of Expt.

Clause Segmentation 81.3 90.3 3 texts, 3 judges

Discourse Marker ID 80.8 89.5 3 texts, 3 judges

Salience Weighting 65.0 67.05 texts, 3 judges

(Machine-Generated Trees)

Salience Weighting67.0 78.0 5 texts, 3 judges

(Human-Generated Trees)

  • Issues

    • How well can humans construct trees?

      • Discourse Segmentation .77 Kappa (30 news, 3 coders)

      • Relations .61 Kappa ditto

    • How well can machines construct trees?

      • Machine trees show poor correlation with human trees, but shape and nucleus/satellite assignment very similar


Agenda2

AGENDA

  • 14:10 pmI. Fundamentals (Definitions, Human Abstracting, Abstract Architecture)

  • 14:40 II. Extraction(Shallow Features, Revision,

  • Corpus-Based Methods)

  • 15:30 Break

  • 16: 00III. Abstraction (Template and Concept-Based)

  • 16:30 IV. Evaluation

  • 17:00 pmV. Research Areas

  • Multi-document, Multimedia, Multilingual

  • Summarization

  • 17:30 pmConclusion


Abstracts require deep methods

Abstracts Require Deep Methods

  • An abstract is a summary at least some of whose material is not present in the input.

  • Abstracts involve inferences made about the content of the text; they can reference background concepts, i.e., those not mentioned explicitly in the text.

  • Abstracts can result in summarization at a much higher degree of compression than extracts

  • Human abstractors make inferences in producing abstracts, but are instructed “not to invent anything”

So, “degree of abstraction” knob important. Could control extent of generalization, degree of lexical substitution, aggregation, etc.


Template extraction

Template Extraction

Source

Analysis

Templates

Transformation

Synthesis

Wall Street Journal, 06/15/88

MAXICARE HEALTH PLANS INC and UNIVERSAL HEALTH SERVICES INC have dissolved a joint venture which provided health services.


Template example paice and jones 1983

Template Example (Paice and Jones 1983)

ConceptDefinition

SPECIESthe crop species concerned

CULTIVARthe varieties used

HIGH-LEVEL PROPERTYthe property being investigated, e.g., yield, growth rate

PESTany pest which infests the crop

AGENTchemical or biological agent applied

INFLUENCEe.g., drought, cold, grazing, cultivation system

LOCALITYwhere the study was performed

TIMEyears when the study was conducted

SOILdescription of soil

Canned Text Patterns

“This paper studies the effect the pest PEST has on the PROPERTY of SPECIES.”

“An experiment in TIME at LOCALITY was undertaken.”

Output: This paper studies the effect the pest G. pallida has on the yield of potato.

An experiment in 1985 and 1986 at York, Lincoln and Peterbourgh, England

was undertaken.


Templates can get complex muc 5

Templates Can get Complex! (MUC-5)


Assessment of template method

Assessment of Template Method

  • Characteristics:

    • Templates can be simple or complex, and there may be multiple templates (e.g., multi-incident document)

    • Templates (and sets of them) benefit from aggregation and elimination operations to pinpoint key summary information

    • Salience is pre-determined based on slots, or computed (e.g., event frequencies)

  • Advantages:

    • Provides a useful capability for abstracting semantic content

    • Steady progress in information extraction, based on machine learning from large corpora

  • Limitations:

    • Requires customization for specific types of input data

    • Only summarizes that type of input data


Concept abstraction method

Concept Abstraction Method

  • Captures the content of a document in terms of abstract categories

  • Abstract categories can be

    • sets of terms from the document

    • topics from labeled collections or background knowledge (e.g., a thesaurus or knowledge base)

  • To leverage background knowledge

    • Obtain an appropriate concept hierarchy

    • Mark concepts in hierarchy with their frequency of reference in the text

      • requires word-sense disambiguation

    • Find the most specific generalizations of concepts referenced in the text

    • Use the generalizations in an abstract


Concept abstraction example

Salient (C) iff

Concept Abstraction Example

The department is buying a Sun Workstation, a HP 3690, and a Toshiba machine. The IBM ThinkPad will not be bought from next year onwards.

Counting Concept and Instance Links(Hahn & Reimer ‘99)

Counting Concept and Subclass Links

(Lin & Hovy ‘99)

Most Specific Generalization:

Traverse downwards until

you find C whose children contribute

equally to its weight

Sun

Workstation

IBM ThinkPad


Assessment of concept abstraction

Assessment of Concept Abstraction

  • Allows for Generalization based on links (instance, subclass, part-of, etc.)

  • Some efforts at controlling extent of generalization

  • Hierarchy needs to be available, and contain domain (senses of) words

    • Generic hierarchies may contain other senses of word

    • Constructing a hierarchy by hand for each domain is prohibitively expensive

  • Result of generalization needs to be readable by human (e.g., generation, visualization)

    • So, useful mainly in transformation phase


Generation statistical of headlines

H=headline, D=doc

Select doc words that occur frequently in example headlines

Order words based on pair co-occurrences

Length of headline

Generation (Statistical) of Headlines

  • Shows how statistical methods can be use to generate abstracts (Banko et al. 2000)


Agenda3

AGENDA

  • 14:10 pmI. Fundamentals (Definitions, Human Abstracting, Abstract Architecture)

  • 14:40 II. Extraction(Shallow Features, Revision,

  • Corpus-Based Methods)

  • 15:30 Break

  • 16: 00III. Abstraction (Template and Concept-Based)

  • 16:30 IV. Evaluation

  • 17:00 pmV. Research Areas

  • Multi-document, Multimedia, Multilingual

  • Summarization

  • 17:30 pmConclusion


Summarization evaluation intrinsic and extrinsic methods

Summarization Evaluation: Intrinsic and Extrinsic Methods

  • Intrinsic methods test the system in itself

    • Criteria

      • Coherence

      • Informativeness

    • Methods

      • Comparison against reference output

      • Comparison against summary input

  • Extrinsic methods test the system in relation to some other task

    • time to perform tasks, accuracy of tasks, ease of use

    • expert assessment of usefulness in task


Coherence how does a summary read

Coherence: How does a summary read?

  • Humans can judge this by subjective grading (e.g., 1-3 scale) on specific criteria

    • General readability criteria: spelling, grammar, clarity, impersonal style, conciseness, readability and understandability, acronym expansion, etc. (Saggion and LaPalme 2000)

    • Criteria can also be specific to extracts (dangling anaphors, gaps,etc.) or abstracts (ill-formed sentences, inappropriate terms, etc.)

  • When subjects assess summaries for coherence, the scores can be compared against scores for reference summaries, scores for source docs, or against scores for other summarization systems

  • Automatic scoring has a limited role to play here


Informativeness is the content preserved

Informativeness: Is the content preserved?

  • Measure the extent to which summary preserves information from a source or a reference summary

  • Humans can judge this by subjective grading (e.g., 1-3 scale) on specific criteria

  • When subjects assess summaries for informativeness, the scores can be compared against scores for reference summaries, scores for source docs, or against scores for other summarization systems

Source

Document

Comparison method

can be manual or automatic

Compare

Human

Summary

(Reference)

Machine

Summary

Machine

Summary


Human agreement in reference extracts

Human Agreement in Reference Extracts

  • Previous studies, most of which have focused on extracts, have shown evidence of low agreement among humans

    Source#docs#subjects% agreementCiteScientific American 1068%Rath et al. 61Funk and Wagnall's 50246%Mitra et al. 97

  • However, there is also evidence that judges may agree more on the most important sentences to include (Jing et al. 99), (Marcu 99)

  • When subjects disagree, system can be compared against majority opinion, most similar human summary (‘optimistic’) or least similar human summary (‘pessimistic’) (Mitra et al. 97)


Intrinsic evaluation summac q a results

Intrinsic Evaluation: SUMMAC Q&A Results

Highest recall associated with the least reduction of the source

Content-based automatic scoring (vocabulary overlap) correlates very well with human scoring (passage/answer recall)


Intrinsic evaluation japanese text summarization challenge 2000

At each compression, systems outperformed Lead and TF baselines in content overlap with human summaries

Subjective grading of coherence and informativeness showed that human abstracts > human extracts > systems and baselines

Subjective Grading

Intrinsic Evaluation: Japanese Text Summarization Challenge (2000)

(Fukusima and Okumura 2001)

Against Extracts

Against Abstracts


Duc 2001 summarization evaluation http www nlpir nist gov projects duc

DUC’2001 Summarization Evaluation http://www-nlpir.nist.gov/projects/duc/

  • Intrinsic evaluation of single and multiple doc English summaries by comparison against referenced summaries

  • 60 reference sets: 30 training, 30 test, each with an average of 10 documents

  • a single 100-word summaries for each document (sds)

  • four multi-document summaries (400, 200, 100, and 50-word) for each set (mds)

www.isi.edu/~cyl/SEE


Duc 2001 setup

DUC’2001 Setup

  • doc sets are on

    • A single event with causes and consequences

    • Multiple distinct events of a single type (e.g., solar eclipses)

    • Subject (discuss a single subject)

    • One of the above in the domain of natural disasters (e.g., Hurricane Andrew)

    • Biographical (discuss a single person))

    • Opinion (different opinions about the same subject, e.g., welfare reform)

  • 400-word mds used to build 50, 100, and 200-word mds

  • Baselines

    • sds - first 100 words

    • mds

      • 1st 50, 100, 200, 400 in most recent

      • 1st sentence in 1st, 2nd, ..nth doc, 2nd sentence, …until 50/100/200/400


Eval criteria

Eval Criteria

  • Informativeness (Completeness)

    • Recall of reference summary units

  • Coherence (1-5 scales)

    • Grammar: “Do the sentences, clauses, phrases, etc. follow the basic rules of English?

      • Don’t worry here about style or the ideas.

      • Concentrate on grammar.”

    • Cohesion: “Do the sentences fit in as they should with the surrounding sentences?

      • Don’t worry about the overall structure of the ideas.

      • Concentrate on whether each sentence naturally follows the preceding one and leads into the next.”

    • Organization: “Is the content expressed and arranged in an effective manner?

      • Concentrate here on the high-level arrangement of the ideas.”


Assessment

Assessment

  • Phase 1: assessor judged system summary against her own reference summary

  • Phase 2: assessor judged system summary against 2 others’ reference summaries

  • System summaries divided into automatically determined sentences (called PUs)

  • Reference summaries divided into automatically determined EDU’s (called MUs), which were then lightly edited by humans


Results coherence

Grammar

Baseline < System < Humans (3.23, 3.53. 3.79 means)

Most baselines contained a sentence fragment

Cohesion

Baseline=system=humans=3 (sds medians)

Baseline=2=system<humans=3 (mds medians)

Organization

Baseline=3=system<humans=4 (sds)

Baseline=2=system<humans=3(mds)

Grammar (esp. ‘All’) too sensitive to low-level formatting

Cohesion/Organization

Cohesion and Organization didn’t make sense for very short summaries

Cohesion hard to distinguish from Organization

Overall, except for grammar, system summaries no better than baselines

Results: Coherence


Informativeness completeness measure

Informativeness (Completeness) Measure

  • For each MU:

  • “The marked PUs, taken together, express [ All, Most, Some, Hardly any, or None ]of the meaning expressed by the MU”


Results informativeness

Results: Informativeness

  • Average Coverage: Average of the per-MU completeness judgments [0..4] for a peer summary

  • Baselines =.5 <= systems =.6 < humans=1.3 (overall medians)

  • lots of outliers

  • relatively lower baseline and system performance on mds

  • small improvements in mds as size increases

  • Even for simple sentences/EDU’s, determination of shared meaning was very hard!


Automatic summarization a tutorial presented at ranlp 2003 inderjeet mani georgetown university

DUC’2003 (NIST slide)

TDT

docs

10 words

30 clusters

Task 1

Very short

single-doc

summaries

100 words

Short

multi-doc

summary

+

TDT topic

Task 2

TREC

docs

30 clusters

Very short

single-doc

summaries

Viewpoint

100 words

Short

multi-doc

summary

+

Task 3

Relevant/novel

sentences

100 words

30 clusters

Short

multi-doc

summary

Task 4

+

TREC Novelty topic

Novelty

docs


Duc 2003 metrics results

DUC’2003 Metrics & Results

  • Coherence: Quality (Tasks 2-4):

    • Systems < Baseline <= Manual

  • Informativeness:

    • Coverage (Tasks 1-4) =avg(per-MU completeness judgments for a peer summary) * target length / actual length

      • Systems < Manual; most systems indistinguishable

    • ‘Usefulness’ (Task 1) Grade each summary according to how useful you think it would be in getting you to choose the document

      • Manual summaries distinct from systems; tracks coverage closely

    • ‘Responsiveness’ (Task 4) Read the topic/question and all the summaries. Consult the relevant sentences as needed. Grade each summary according to how responsive it is in form and content to the question.

      • Manual summaries distinct from systems/baselines; tracks coverage generally


Baseline summaries etc nist slide

Baseline summaries etc. (NIST slide)

  • NIST (Nega Alemayehu) created baseline summaries

    • Baselines 2-5: automatic

    • based roughly on algorithms suggested by Daniel Marcu

    • no truncation of sentences, so some baseline summaries went over the limit (+ <=15 words) and some were shorter than required)

  • Original author’s headline 1 (task 1)

    • Use the document’s own “headline” element

  • Baseline 2 (tasks 2, 3)

    • Take the 1st 100 words in the most recent document.

  • Baseline 3 (tasks 2, 3)

    • Take the 1st sentence in the 1st, 2nd, 3rd,… document in chronological sequence until you have 100 words.

  • Baseline 4 (task 4)

    • Take the 1st 100 words from the 1st n relevant sentences in the 1st document in the set. ( Documents ordered by relevance ranking given with the topic.)

  • Baseline 5 (task 4)

    • Take the 1st relevant sentence from the 1st, 2nd, 3rd,… document until you have 100 words. (Documents ordered by relevance ranking given with the topic.)


Extrinsic methods usefulness of summary in task

Extrinsic Methods: Usefulness of Summary in Task

  • If the summary involves instructions of some kind, it is possible to measure the efficiency in executing the instructions.

  • measure the summary's usefulness with respect to some information need or goal, such as

    • finding documents relevant to one's need from a large collection, routing documents

    • extracting facts from sources

    • producing an effective report or presentation using a summary

    • etc.

  • assess the impact of a summarizer on the system in which it is embedded, e.g., how much does summarization help the question answering system?

  • measure the amount of effort required to post-edit the summary output to bring it to some acceptable, task-dependent state

  • …. (unlimited number of tasks to which summarization could be applied)


Summac time and accuracy adhoc task 21 subjects

SUMMAC Time and Accuracy (adhoc task, 21 subjects)

All time differences are significant except between B & S1

S2’s (23% of source on avg.) roughly halved decision time rel. to F (full-text)!

All F-score and Recall differences are significant except between F& S2

Conclusion - Adhoc

S2’s save time by 50% without impairing accuracy!


Agenda4

AGENDA

  • 14:10 pmI. Fundamentals (Definitions, Human Abstracting, Abstract Architecture)

  • 14:40 II. Extraction(Shallow Features, Revision,

  • Corpus-Based Methods)

  • 15:30 Break

  • 16: 00III. Abstraction (Template and Concept-Based)

  • 16:30 IV. Evaluation

  • 17:00 pmV. Research Areas

  • Multi-document, Multimedia, Multilingual

  • Summarization

  • 17:30 pmConclusion


Multi document summarization

Multi-Document Summarization

  • Extension of single-document summarization to collections of related documents

    • but naïve “concatenate each summary” extension is faced with repetition of information across documents

  • Requires fusion of information across documents

    • Elimination, aggregation, and generalization operations carried out on collection instead of individual documents

  • Collections can vary considerably in size

    • different methods for different ranges (e.g, cluster first if > n)

  • Higher compression rates usually needed

    • perhaps where abstraction is really critical

  • NL Generation and Visualization have an obvious role to play here


Example mds problems

Eighteen decapitated bodies have been found in a mass grave in northern Algeria, press reports said Thursday.

Algerian newspapers have reported on Thursday that 18 decapitated bodies have been found by the authorities.

Example MDS Problems


Multi document summarization methods

Multi-Document Summarization Methods

  • Shallow Approaches

    • passage extraction and comparison

      • removes redundancy by vocabulary overlap comparisons

  • Deep Approaches

    • template extraction and comparison

      • removes redundancy by aggregation and generalization operators

    • syntactic and semantic passage comparison


Passage extraction and summarization

Passage Extraction and Summarization

  • Maximal Marginal Relevance

  • Example: 100 hits - 1st 20 same event, but 36, 41, 68 very different, although marginally less relevant

  • As a post-retrieval filter to retrieval of relevance-ranked hits, offers a reranking parameter which allows you to slide between relevance to query and diversity from hits you have seen so far.

    MMR(Q, R, S) = ArgmaxDi in R\S[lsim1(Di, Q) - (1-l) maxDj in R sim2(Di, Dj)]

    where Q is the query, R is the retrieved set, S is the scanned subset of R

    Example:

    R={D1, D2, D3}; S= {D1}; l=0

    Dj=D2=>-(1- l)sim2(D2,D1) = -.4

    Dj=D3=>-(1- l)sim2(D2,D1) = -.2, so pick D3

  • Cohesion-Based Approaches Across Documents

    • Salton’s Text Maps

    • User-Focused Passage Alignment

D3

Q

D1

D2


User focused passage alignment

User-Focused Passage Alignment


Template comparison method mckeown and radev 1995

Template Comparison Method (McKeown and Radev 1995)

  • Contradiction operator: applies to template pairs which have same incident location but which originate from different sources (provided at least one other slot differs in value)

    • If value of number of victims is lowered across two reports from the same source, this suggests the old information is incorrect; if it goes up, the first report had incomplete information

      The afternoon of Feb 26, 1993, Reuters reported that a suspected bomb killed at least five people in the World Trade Center. However, Associated Press announced that exactly five people were killed in the blast.

  • Refinement operator: applies to template pairs where the second’s slot value is a specialization of the first’s for a particular slot (e.g., terrorist group identified by country in first template, and by name in later template)

  • Other operators: perspective change, agreement, addition, superset, trend, etc.


Syntactic passage comparison multigen

Syntactic Passage Comparison (MultiGen)

Example Theme for Syntactic

Comparison

Assumes very tight clustering of documents.

Similar to revision-based methods


Lexical semantic merging biogen

Lexical Semantic Merging: BIOGEN

  • Given 1,300 news docs

  • 707,000 words in collection

  • 607 sentences which mention “Jordan” by name

  • 78 appositive phrases which fall (using WordNet) into 2 semantic groups: “friend”, “adviser”;

  • 65 sentences with “Jordan” as logical subject, filtered based on verbs which are strongly associated in a background corpus with “friend” or “adviser”, e.g., “testify”, “plead”, “greet”

  • 3 sentence summary

Vernon Jordan is a presidential friend and a Clinton adviser. He helped Ms. Lewinsky find a job. Hetestified that Ms. Monica Lewinsky said that she had conversations with the president, that she talked to the president.

Henry Hyde is a Republican chairman of House Judiciary Committee and a prosecutor in Senate impeachment trial. He will lead the Judiciary Committee's impeachment review.Hyde urged his colleagues to heed their consciences , “the voice that whispers in our ear , ‘duty, duty, duty.’”

.

For details, see Mani et al. ACL’2001


Appositive merging examples

Senatormf

+

Democrat

politician

leader

person

Appositive Merging Examples

A, B < X < Person

A=B

lawyermf

+

attorney

synonym

person

WisconsinmfDemocrat

+

senior

Democrat

a lawyer

for the defendant

+

an attorney

for Paula Jones

Chairman of the Budget Committee

+

Budget Committee Chairman

mf: more frequent head/modifier for name in collection


Verb subject associations for appositive head nouns

Verb-subject associations for appositive head nouns

executivepolice politician

reprimand 16.36shoot 17.37 clamor 16.94

conceal 17.46raid 17.65 jockey 17.53

bank 18.27arrest 17.96wrangle 17.59

foresee 18.85detain 18.04woo 18.92

conspire 18.91disperse 18.14exploit 19.57

convene 19.69interrogate 18.36brand 19.65

plead 19.83swoop 18.44behave 19.72

sue 19.85evict 18.46dare 19.73

answer 20.02bundle 18.50sway 19.77

commit 20.04manhandle 18.59criticize 19.78

worry 20.04search 18.60flank 19.87

accompany 20.11confiscate 18.63proclaim 19.91

own 20.22apprehend 18.71annul 19.91

witness 20.28round 18.78favor 19.92


Multimedia summarization

MULTIMEDIA SUMMARIZATION


Broadcast news navigator example

Broadcast News Navigator Example

Sentence extraction from cc, plus list of NEs

Related

Stories

InternetQuery terms constructed from Nes

Hits are then summarized


Bnn summary story skim

BNN Summary: Story Skim*


Bnn story details

BNN Story Details*

text

topics

summary

named

entities


Identification precision vs time with recall comparison

IDEAL

1

Lower Recall

High Precision

A

0.95

B

Topic

Video

Skim

Story Details

Full Details

0.9

C

Summary

Also High Recall

3 Named Entities

0.85

Key Frame

Average Precision

All Named Entities

0.8

Text

0.75

0.7

0

2

4

6

8

Average Time (minutes)

Identification:Precision vs. Time (with Recall Comparison)

E.g., What stories are

about Sonny Bono?

  • Results

  • Less is better (in time and precision)

  • Mixed media summaries better than single media


Cmu meeting summarization zechner 2001

S1: well um I think we should discuss this you know with her

S1: That’s true I suggest

S1: you talk to him

S1: yeah well now get this we might go to live in switzerland

S2: oh really

S1: yeah because they’ve made him a job offer there and at first thinking nah he wasn’t going to take it but now he’s like

S1: when are we meeting?

S2: you mean tomorrow?

S1: yes

S2: at 4 pm

Summarizes audio transcriptions from multi-party dialogs

Integrated with meeting browser

Detects disfluencies: filled pauses, repairs, restarts, false starts

Identifies sentence boundaries

Identifies question-answer pairs

Then does sentence ranking using MMR

When run on automatically transcribed audio, biases summary towards words the recognizer is confident of

CMU Meeting Summarization (Zechner 2001)


Event visualization and summarization geospatial news on demand env geonode

Event Visualization and Summarization:Geospatial News on Demand Env. (GeoNODE)

VCR like controls supports exploration of corpus

Geospatial and Temporal Display of Events extracted from Corpus

Automated Cross Document, Multilingual Topic Cluster Detection and Tracking

Event Frequency by Source


Multilingual summarization isi

Multilingual Summarization(ISI)

Indonesian hits

Summary

Machine Translation


Conclusion

Conclusion

  • Automatic Summarization is alive and well!

  • As we interact with the massive information universes of today and tomorrow, summarization in some form is indispensable

  • Areas for the future

    • multidocument summarization

    • multimedia summarization

    • summarization for hand-held displays

    • temporal summarization

    • etc.


Resources

Resources

  • Books

    • Mani, I. and Maybury, M. (eds.) 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge.

    • Mani, I. 2001. Automated Text Summarization. John Benjamins, Amsterdam.

  • Journals

    • Mani, I. And Hahn, U. Nov 2000. Summarization Tutorial. IEEE Computer.

  • Conferences/Workshops

    • Dagstuhl Seminar, 1993 (Karen Spärck Jones, Brigitte Endres-Niggemeyer) www.ik.fh-hannover.de/ik/projekte/Dagstuhl/Abstract

    • ACL/EACL Workshop on Intelligent Scalable Text Summarization, Madrid, 1997 (Inderjeet Mani, Mark Maybury) (www.cs.columbia.edu/~radev/ists97/program.html)

    • AAAI Spring Symposium on Intelligent Text Summarization, Stanford, 1998 (Dragomir Radev, Eduard Hovy) (www.cs.columbia.edu/~radev/aaai-sss98-its)

    • ANLP/NAACL Summarization Workshop, Seattle, 2000 (Udo Hahn, Chin-Yew Lin, Inderjeet Mani, Dragomir Radev) www.isi.edu/~cyl/was-anlp2000.html

    • NAACL Summarization Workshop, Pittsburgh, 2001


Web references

Web References

  • On-line Summarization Tutorials

    • www.si.umich.edu/~radev/summarization/radev-summtutorial00.ppt

    • www.isi.edu/~marcu/coling-acl98-tutorial.html

  • Bibliographies

    • www.si.umich.edu/~radev/summarization/

    • www.cs.columbia.edu/~jing/summarization.html

    • www.dcs.shef.af.uk/~gael/alphalist.html

    • www.csi.uottawa.ca/tanka/ts.html

  • Survey: “State of the Art in Human Language Technology” (cslu.cse.ogi.edu/HLTsurvey)

  • Government initiatives

    • DUC Multi-document Summarization Evaluation (www-nlpir.nist.gov/projects/duc)

    • DARPA’s Translingual Information Detection Extraction and Summarization (TIDES) Program (tides.nist.gov, www.darpa.mil/ito/research/tides/projlist.html)

    • European Intelligent Information Interfaces program (www.i3net.org)


Agenda5

AGENDA

Thank You!


  • Login