predicting sentence specificity with applications to news summarization n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Predicting sentence specificity, with applications to news summarization PowerPoint Presentation
Download Presentation
Predicting sentence specificity, with applications to news summarization

Loading in 2 Seconds...

play fullscreen
1 / 49

Predicting sentence specificity, with applications to news summarization - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Predicting sentence specificity, with applications to news summarization. Ani Nenkova, joint work with Annie Louis University of Pennsylvania. Motivation. A well-written text is a mix of general statements and sentences providing details

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Predicting sentence specificity, with applications to news summarization' - anahid


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
predicting sentence specificity with applications to news summarization

Predicting sentence specificity, with applications to news summarization

Ani Nenkova, joint work with Annie Louis

University of Pennsylvania

motivation
Motivation
  • A well-written text is a mix of general statements and sentences providing details
  • In information retrieval: find relevant and well-written documents
  • Writing support: visualize general and specific areas
supervised sentence level classifier for general specific
Supervised sentence-level classifier for general/specific
  • Training data
    • Used existing annotations for discourse relations from PDTB
  • Features
    • Lexical, language model, syntax, etc
  • Testing data
    • Annotators judged more sentences
  • Applications to analysis of summarization output
    • Automatic summaries too specific, worse for that
training data
Training data
  • Penn discourse tree bank
penn discourse treebank pdtb
Penn Discourse Treebank (PDTB)
  • Largest annotated corpus of explicit and implicit discourse relations
  • 1 million words of Wall Street Journal
  • Arguments – spans linked by a relation (Arg1, Arg2)
  • Sense – semantics of the relation (3 level hierarchy)

I love ice-cream but I hate chocolates.

(discourse connectives)

I came late. I missed the train.

(adjacent sentences in the same paragraph)

distribution of relations between adjacent sentences
Distribution of relations between adjacent sentences

(Adjacent sentences linked by an entity. Not considered a true discourse relation.)

training data from pdtb expansions
Training data from PDTB Expansions

Expansion

Conjunction

[Also, Further]

Restatement

[Specifically, Overall]

Instantiation

[For example]

List

[And]

Alternative

[Or, Instead]

Exception

[except]

Specification

Generalization

Conjunctive

Disjunctive

Chosen

alternative

Equivalence

7

instantiation example
Instantiation example

The 40 year old Mr. Murakami is a publishing sensationin Japan.

A more recent novel, “Norwegian wood”, has sold more than forty million copies since Kodansha published it in 1987.

examples of general specific sentences
Examples of general /specific sentences
  • Despite recent declines in yields, investors continue to pour cash into money funds.

Assets of the 400 taxable funds grew by $1.5 billion during the latest week, to $352 billion. [Instantiation]

  • By most measures, the nation’s industrial sector is now growing very slowly—if at all.

Factory payrolls fell in September. [Specification]

experimental setup two classifiers
Experimental setup—Two classifiers
  • Instantiations-based
    • Arg1: General, Arg2: specific
    • 1403 examples
  • Restatement#Specifications-based
    • Arg1: General, Arg2: specific
    • 2370 examples
  • Implicit relations only
  • 50% baseline accuracy; 10 fold-cross validation; Logistic regression
features
Features
  • Developed from a small development set
    • 10 pairs of specification
    • 10 pairs of instantiation
features for general vs specific
Features for general vs specific
  • Sentence length: no. of tokens, no. of nouns
    • Expected general sentences to be shorter
  • Polarity: no. of positive/ negative/ polarity words, also normalized by length
    • General Inquirer
    • MPQA subjectivity lexicon
    • In dev set, sentences with strong opinion are general
  • Language models: unigram/ bigram/ trigram probability & perplexity
    • Trained on one year of New York Times news
    • In dev set, general sentences contained unexpected, catchy phrases
features for general vs specific1
Features for general vs specific
  • Specificity
    • min/ max/ avg IDF
    • WordNet: hypernym distance to root for nouns and verbs—min/ max/ avg
  • Syntax: No. of adjectives, adverbs,

ADJP, ADVP, verb phrases, avg VP length

  • Entities: Numbers, proper names, $ sign, plural nouns
  • Words: count of each word in the sentence
instantiation based classifier gave better performance
Instantiation based classifier gave better performance
  • Best individual feature set: words (74.8%)
  • Non-lexical features are equally good: 74.1%
  • No improvement by combining: 75.8%
feature analysis
Feature analysis
  • Words with highest weight [Instantiation-based]
    • General: number, but, also, however, officials, some, what,

lot, prices, business, were…

    • Specific: one, a, to, co, I, called, we, could, get…
  • General sentences are characterized by
    • Plural nouns
    • Dollar sign
    • Lower probability
    • More polarity words and more adjectives and adverbs
  • Specific sentences are characterized by
    • Numbers and names
more testing data
More testing data
  • Direct judgments of WSJ and AP sentences on Amazon Mechanical Turk
  • ~ 600 sentences
  • 5 judgments per sentence
slide19

In WSJ, more sentences are general (55%)

In AP, more sentences are specific (60%)

why the difference between instantiation and specification
Why the difference between Instantiation and Specification?
  • Some of the annotations were on our initial training data

Has more detectable properties associated with Arg1 and Arg2

accuracy of classifier on new data
Accuracy of classifier on new data

Non-lexical features work better on this data

Performance is almost the same as in cross validation

Classifier is more accurate on examples where people agree

Classifier confidence correlates with annotator agreement

application of our classifier to full articles
Application of our classifier to full articles

22

  • Distribution of general/specific sentences in news documents
  • Can the classifier detect differences in general/specific summaries by people
  • Do summaries have more general/specific content compared to input? How does it impact summary quality?
  • Compare different types of summaries
    • Human abstracts: written from scratch
    • Human extracts: select sentences as a whole from inputs
    • System summaries: all extracts
example general and specific predictions
Example general and specific predictions
  • Seismologists said the volcano had plenty of built-up magma and even more severe eruptions could come later.[general]
  • The volcano's activity -- measured by seismometers detecting slight earthquakes in its molten rock plumbing system -- is increasing in a way that suggests a large eruption is imminent, Lipman said.

[specific]

example predictions
Example predictions

The novel, a story of a Scottish low-life narrated largely in Glaswegian dialect, is unlikely to prove a popular choice with booksellers who have damned all six books shortlisted for the prize as boring, elitist and – worse of all – unsaleable.

The Booker prize has, in its 26-year history, always provoked controversy.

Specific

General

24

computing specificity for a text
Computing specificity for a text
  • Sentences in summary are of varying length, so we compute a score on word level
  • “Average specificity of words in the text”

Confidence for being

in specific class

0.68

0.68

0.68

0.68

S1:

w11

w12

w13

0.68

0.23

0.23

0.23

0.23

S2:

w21

w22

w23

0.23

0.81

0.81

0.81

0.81

S3:

w31

w32

w33

0.81

Average score on tokens

Specificity score

50 specific and general human summaries
50 specific and general human summaries

No significant differences in specificity of the input

Significant differences in specificity of summaries in the two categories

Our classifier is able to detect the differences

data duc 2002
Data: DUC 2002
  • Generic multidocument summarization task
  • 59 input sets
    • 5 to 15 news documents
  • 3 types of summaries
    • 200 words
    • Manually assigned content and linguistic quality scores

1. Human

abstracts

2. Human

extracts

3. System

extracts

2 assessors * 59

2 assessors * 59

9 systems * 59

specificity analysis of summaries
Specificity analysis of summaries
  • More general content is preferred in abstracts
  • Simply the process of extraction makes summaries more specific
  • System summaries are overly specific

0.6

0.7

0.8

[Avg. specificity]

H. Abs (0.62)

Inputs (0.65)

H.ext (0.72)

S.ext (0.74)

histogram of specificity scores
Histogram of specificity scores
  • Human summaries are more general
  • Is the aspect related to summary quality?
analysis of system summaries specificity and quality
Analysis of ‘system summaries’: specificity and quality
  • Content quality
    • Importance of content included in the summary
  • Linguistic quality
    • How well-written the summary is perceived to be
  • Quality of general/specific summaries
    • When a summary is intended to be general or specific
relationship to content selection scores
Relationship to content selection scores
  • Coverage score: closeness to human summary
    • Clause level comparison
  • For system summaries
    • Correlation between coverage score and average specificity
      • -0.16*, p-value = 0.0006
    • Less specific ~ better content
but the correlation is not very high
But the correlation is not very high
  • Specificity is related to realization of content
    • Different from importance of the content
  • Content quality = content importance + appropriate specificity level
  • Content importance: ROUGE scores
    • N-gram overlap of system summary and human summary
    • Standard evaluation of automatic summaries
specificity as one of the predictors
Specificity as one of the predictors
  • Coverage score ~ ROUGE-2 (bigrams) + specificity
  • Linear regression
  • Weights for predictors in the regression model

Is the combination a better predictor than ROUGE alone?

2 specificity and linguistic quality
2. Specificity and linguistic quality
  • Used different data: TAC 2009
    • DUC 2002 only reported number of errors
    • Were also specified as a range: 1-5 errors
  • TAC 2009 linguistic quality score
    • Manually judged: scale 1 – 10
    • Combines different aspects
      • coherence, referential clarity, grammaticality, redundancy
what is the avg specificity in different score categories
What is the avg specificity in different score categories?
  • More general ~ lower score!
    • General content is useful but need proper context!
  • If a summary starts as follows:
  • “We are quite a ways from that, actually.”
  • As ice and snow at the poles melt, …

Specificity = low

Linguistic quality = 1

data for analysing generalization operation
Data for analysing generalization operation
  • Aligned pairs of abstract and source sentences conveying the same content
    • Traditional data used for compression experiments
  • Ziff-Davis tree alignment corpus
    • 15964 sentence pairs
    • Any number of deletions, up to 7 substitutions
  • Only 25% abstract sentences are mapped
    • But beneficial to observe the trends

[Galley & McKeown (2007)]

generalization operation in human abstracts
Generalization operation in human abstracts

One-third of all transformations are specific to general

  • Human abstracts involve a lot of generalization
how specific sentences get converted to general
How specific sentences get converted to general?

Choose long sentences and compress heavily!

  • A measure of generality would be useful to guide compression
    • Currently only importance and grammaticality are used
use of general sentences in human extracts
Use of general sentences in human extracts
  • Details of Maxwell’s death were sketchy.
  • Folksy was an understatement.
  • “Long live democracy!”
  • Instead it sank like the Bismarck.
  • Example use of a general sentence in a summary

With Tower’s qualifications for the job, the nominations should have sailed through with flying colors. [Specific]

Instead it sank like the Bismarck. [General]

Future: can we learn to generate and select general sentences to include in automatic summaries?

conclusions
Conclusions
  • Built a classifier for general and specific sentences
  • Used existing annotations to do that
  • But tested on new data and task-based evaluation
  • The confidence of the classifier is highly correlated with human agreement
  • Analyzed human and machine summaries
    • Machine summaries are too specific
    • But adding general sentences is difficult because the context has to be right
further details in
Further details in
  • Annie Louis and Ani Nenkova,

Automatic identification of general and specific sentences by leveraging discourse annotations, 
Proceedings of IJCNLP, 2011 (To Appear).

  • Annie Louis and Ani Nenkova, Text specificity and impact on quality of news summaries, 
Proceedings of ACL-HLT Workshop on Monolingual Text to Text Generation, 2011.
  • Annie Louis and Ani Nenkova, Creating Local Coherence: An Empirical Assessment, 
Proceedings of NAACL-HLT 2010.
two types of local coherence entity rhetorical
Two types of local coherence—Entity & Rhetorical
  • Local coherence: Adjacent sentences in a text flow from one to another
  • Entity – same topic
    • John was hungry. He went to a restaurant.
  • But only 42% sentence pairs are entity-linked [previous corpus studies]
  • Will core discourse relations connect the non-entity sharing sentence pairs?
    • Popular hypothesis in prior work
investigations into text quality
Investigations into text quality
  • The mix of discourse relations in a text is highly predictive of the perceived quality of the text
  • Both implicit and explicit relations are needed to predict text quality
  • Predicting the sense of implicit discourse relations is a very difficult task; most predicted to be “expansion”
  • How is local coherence created?
joint analysis by combining pdtb and ontonotes annotations
Joint analysis by combining PDTB and Ontonotes annotations
  • 590 articles
  • Noun phrase coreference from Ontonotes
  • 40 to 50% of sentence pairs do not share entities in articles of different lengths
example instantiations and list relations
Example instantiations and list relations
  • Instantiation

The economy is showing signs of weakness, particularly among manufacturers.

Exports which played a key role in fueling growth over the last two years, seem to have stalled.

  • List

Many of Nasdaq's biggest technology stocks were in the forefront of the rally.

- Microsoft added 2 1/8 to 81 3/4 and Oracle Systems rose 1 1/2 to 23 1/4.

- Intel was up 1 3/8 to 33 3/4.

overall distribution of sentence pairs among the two coherence devices
Overall distribution of sentence pairs among the two coherence devices
  • 30% sentence pairs have no coreference and are in a weak discourse relation (expansion/entrel)
  • We must explore elaboration more closely to identify how they create coherence