The pythy summarization system microsoft research at duc 2007
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

The Pythy Summarization System: Microsoft Research at DUC 2007 PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

The Pythy Summarization System: Microsoft Research at DUC 2007. Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki, and Lucy Vanderwende Microsoft Research April 26, 2007. DUC Main Task Results. Automatic Evaluations (30 participants)

Download Presentation

The Pythy Summarization System: Microsoft Research at DUC 2007

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The pythy summarization system microsoft research at duc 2007

The Pythy Summarization System: Microsoft Research at DUC 2007

Kristina Toutanova, Chris Brockett,

Michael Gamon, Jagadeesh Jagarlamudi,

Hisami Suzuki, and Lucy Vanderwende

Microsoft Research

April 26, 2007


Duc main task results

DUC Main Task Results

  • Automatic Evaluations (30 participants)

  • Human Evaluations

  • Did pretty well on both measures


Overview of pythy

Overview of Pythy

  • Linear sentence ranking model

  • Learns to rank sentences based on:

    • ROUGE scores against model summaries

    • Semantic Content Unit (SCU) weights of sentences selected by past peers

  • Considers simplified sentences alongside original sentences


The pythy summarization system microsoft research at duc 2007

Sentences

PYTHY

Training

Simplified

Sentences

Docs

Docs

Targets

Ranking/

Training

ROUGE Oracle

Pyramid/

SCU

ROUGE X 2

Model

Docs

Docs

Feature

inventory


The pythy summarization system microsoft research at duc 2007

Sentences

PYTHY

Testing

Simplified

Sentences

Docs

Docs

Search

Model

Dynamic Scoring

Docs

Docs

Summary

Feature

inventory


Sentence simplification

Sentences

PYTHY

Training

Simplified

Sentences

Docs

Docs

Targets

Ranking

Training

ROUGE Oracle

Pyramid/

SCU

ROUGE X 2

Model

Sentence Simplification

Docs

  • Extension of simplification method for DUC06

    • Provides sentence alternatives, rather than deterministically simplify a sentence

    • Uses syntax-based heuristic rules

    • Simplified sentences evaluated alongside originals

  • In DUC 2007:

    • Average new candidates generated: 1.38 per sentence

    • Simplified sentences generated: 61% of all sents

    • Simplified sentences in final output: 60%

Docs

Feature

inventory


Sentence level features

Sentences

PYTHY

Training

Simplified

Sentences

Docs

Docs

Targets

Ranking

Training

ROUGE Oracle

Pyramid/

SCU

ROUGE X 2

Model

Sentence-Level Features

Docs

  • SumFocus features: SumBasic (Nenkova et al 2006) + Task focus

    • cluster frequency and topic frequency

    • only these used in MSR DUC06

  • Other content word unigrams: headline frequency

  • Sentence length features (binary features)

  • Sentence position features (real-valued and binary)

  • N-grams (bigrams, skip bigrams, multiword phrases)

  • All tokens (topic and cluster frequency)

  • Simplified Sentences (binary and ratio of relative length)

  • Inverse document frequency (idf)

Docs

Feature

inventory


Pairwise ranking

Sentences

PYTHY

Training

Simplified

Sentences

Docs

Docs

Targets

Ranking

Training

ROUGE Oracle

Pyramid/

SCU

ROUGE X 2

Model

Pairwise Ranking

Docs

  • Define preferences for sentence pairs

    • Defined using human summaries and SCU weights

  • Log-linear ranking objective used in training

  • Maximize the probability of choosing the better sentence from each pair of comparable sentences

Docs

[Ofer et al. 03], [Burges et al. 05]

Feature

inventory


Rouge oracle metric

Sentences

PYTHY

Training

Simplified

Sentences

Docs

Docs

Targets

Ranking

Training

ROUGE Oracle

Pyramid/

SCU

ROUGE X 2

Model

Rouge Oracle Metric

Docs

  • Find an oracle extractive summary

    • the summary with the highest average ROUGE-2 and ROUGE-SU4 scores

  • All sentences in the oracle are considered “better” than any sentence not in the oracle

  • Approximate greedy search used for finding the oracle summary

Docs

Feature

inventory


Pyramid derived metric

Sentences

PYTHY

Training

Simplified

Sentences

Docs

Docs

Targets

Ranking

Training

ROUGE Oracle

Pyramid/

SCU

ROUGE X 2

Model

Pyramid-Derived Metric

Docs

  • University of Ottawa SCU-annotated corpus (Copeck et al 06)

  • Some sentences in 05 & 06 document collections are:

    • known to contain certain SCUs

    • known not to contain any SCUs

  • Sentence score is sum of weights of all SCUs

    • for un-annotated sentences, the score is undefined

  • A sentence pair is constructed for training s1 >s2 iff w(s1)>w(s2)

Docs

Feature

inventory


Model frequency metrics

Sentences

PYTHY

Training

Simplified

Sentences

Docs

Docs

Targets

Ranking

Training

ROUGE Oracle

Pyramid/

SCU

ROUGE X 2

Model

Model Frequency Metrics

Docs

  • Based on unigram and skip bigram frequency

  • Computed for content words only

  • Sentence siis “better” than sj if

Docs

Feature

inventory


Combining multiple metrics

Sentences

PYTHY

Training

Simplified

Sentences

Docs

Docs

Targets

Ranking

Training

ROUGE Oracle

Pyramid/

SCU

ROUGE X 2

Model

Combining multiple metrics

Ranking

Training

Docs

Feature

inventory

  • From ROUGE oracle

    all sentences in oracle summary better than other sentences

  • From SCU annotations

    sentences with higher avg SCU weights better

  • From model frequency

    sentences with words occurring in models better

  • Combined loss: adding the losses according to all metrics

Docs


The pythy summarization system microsoft research at duc 2007

Sentences

PYTHY

Testing

Simplified

Sentences

Docs

Docs

Search

Model

Dynamic Scoring

Docs

Docs

Summary

Feature

inventory


Dynamic sentence scoring

Search

Dynamic Sentence Scoring

Dynamic Scoring

  • Eliminate redundancy by re-weighting

  • Similar to SumBasic (Nenkova et al 2006), re-weighting given previously selected sentences

  • Discounts for features that decompose into word frequency estimates


Search

Search

Search

Dynamic Scoring

  • The search constructs partial summaries and scores them:

  • The score of a summary does not decompose into an independent sum of sentence scores

    • Global dependencies make exact search hard

  • Used multiple beams for each length of partial summaries

    • [McDonald 2007]


Impact of sentence simplification

Impact of Sentence Simplification

  • Trained on 05 data, tested on O6 data


Impact of sentence simplification1

Impact of Sentence Simplification

  • Trained on 05 data, tested on O6 data


Impact of sentence simplification2

Impact of Sentence Simplification

  • Trained on 05 data, tested on O6 data


Evaluating the metrics

Evaluating the Metrics

Trained on 05 data, tested on 06 data

Includes simplified sentences


Evaluating the metrics1

Evaluating the Metrics

Trained on 05 data, tested on 06 data

Includes simplified sentences


Update summarization pilot

Update Summarization Pilot

  • SVM novelty classifier trained on TREC 02 & 03 novelty track


Summary and future work

Summary and Future Work

  • Summary

    • Combination of different target metrics for training

    • Many sentence features

    • Pair-wise ranking function

    • Dynamic scoring

  • Future work

    • Boost robustness

      • Sensitive to cluster properties (e.g., size)

    • Improve grammatical quality of simplified sentences

    • Reconcile novelty and (ir)relevance

    • Learn features over whole summaries rather than individual sentences


Thank you

Thank You


  • Login