speech summarization n.
Skip this Video
Loading SlideShow in 5 Seconds..
Speech Summarization PowerPoint Presentation
Download Presentation
Speech Summarization

Loading in 2 Seconds...

play fullscreen
1 / 23

Speech Summarization - PowerPoint PPT Presentation

  • Uploaded on

Speech Summarization. Julia Hirschberg (thanks to Sameer Maskey for some slides) CS4706. Summarization  Distillation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Speech Summarization' - nathan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speech summarization

Speech Summarization

Julia Hirschberg (thanks to Sameer Maskey for some slides)


summarization distillation
Summarization Distillation
  • ‘…the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks) [Mani and Maybury, 1999]
  • Why summarize? Too much data!
types of summarization
Types of Summarization
  • Indicative
    • Describes the document and its contents
  • Informative
    • ‘Replaces’ the document
  • Extractive
    • Concatenate pieces of existing document
  • Generative
    • Creates a new document
  • Document compression

[Salton, et al., 1995]

Sentence Extraction

Similarity Measures

[McKeown, et al., 2001]

Extraction Training

w/ manual Summaries




[Hovy & Lin, 1999]

Concept Level

Extract concepts units

[Witbrock & Mittal, 1999]

Generate Words/Phrases

[Maybury, 1995]

Use of Structured Data

sentence extraction similarity measures salton et al 1995
Sentence Extraction/Similarity measures [Salton, et al. 1995]
  • Extract sentences by their similarity to a topic sentence and their dissimilarity to sentences already in summary (Maximal Marginal Relativity)
  • Similarity measures
    • Cosine Measure
    • Vocabulary Overlap
    • Topic word overlap
    • Content Signatures Overlap
concept content level extraction hovy lin 1999
Concept/content level extraction [Hovy & Lin, 1999]
  • Present key-words as summary
  • Builds concept signatures by finding relevant words in 30,000 WSJ documents, each categorized into different topics
  • Phrase concatenation of relevant concepts/content
  • Sentence planning for generation
feature based statistical models kupiec et al 1995
Feature-based statistical models [Kupiec, et al., 1995]
  • Create manual summaries
  • Extract features
  • Train statistical model using various ML techniques
  • Use the trained model to score each sentence in the test data
  • Extract N highest-scoring sentences
      • Where S is summary given k features Fj and P(Fj) & P(Fj|s of S) can be computed by counting occurrences
structured database maybury 1995
Structured Database [Maybury, 1995]
  • Summarize text represented in structured form: database, templates
    • E.g. generation of a medical history from a database of medical ‘events’
  • Link analysis (semantic relations within the structure)
  • Domain dependent importance of events
comparing speech and text summarization
Comparing Speech and Text Summarization
  • Alike
    • Identifying important information
    • Some lexical, discourse features
    • Extraction or generation or compression
  • Different
    • Speech Signal
    • Prosodic features
    • NLP tools?
    • Segments – sentences?
    • Generation?
    • Errors
    • Data size
text vs speech summarization news
Text vs. Speech Summarization (NEWS)

Speech Signal

Speech Channels

- phone, remote satellite, station


- ASR, Close Captioned

Error-free Text

Transcript- Manual

Many Speakers

- speaking styles

Lexical Features

Some Lexical Features




-Anchor, Reporter Interaction

Story presentation


Prosodic Features

-pitch, energy, duration

NLP tools

Commercials, Weather Report

speech summarization today
Speech Summarization Today
  • Mostly extractive:
    • Words, sentences, content units
  • Some compression methods
  • Generation-based summarization difficult
    • Text or synthesized speech?
generation or extraction
Generation or Extraction?
  • SENT27 a trial that pits the cattle industry against tv talk show host oprah winfrey is under way in amarillo , texas.
  • SENT28 jury selection began in the defamation lawsuit began this morning .
  • SENT29 winfrey and a vegetarian activist are being sued over an exchange on her April 16, 1996 show .
  • SENT30 texas cattle producers claim the activists suggested americans could get mad cow disease from eating beef .
  • SENT31 and winfrey quipped , this has stopped me cold from eating another burger
  • SENT32 the plaintiffs say that hurt beef prices and they sued under a law banning false and disparaging statements about agricultural products
  • SENT33 what oprah has done is extremely smart and there's nothing wrong with it she has moved her show to amarillo texas , for a while
  • SENT34 people are lined up , trying to get tickets to her show so i'm not sure this hurts oprah .
  • SENT35 incidentally oprah tried to move it out of amarillo . she's failed and now she has brought her show to amarillo .
  • SENT36 the key is , can the jurors be fair
  • SENT37 when they're questioned by both sides, by the judge , they will be asked, can you be fair to both sides
  • SENT38 if they say , there's your jury panel
  • SENT39 oprah winfrey's lawyers had tried to move the case from amarillo , saying they couldn't get an impartial jury
  • SENT40 however, the judge moved against them in that matter …




[Christensen et al., 2004]

Sentence extraction with

similarity measures

[Hori C. et al., 1999, 2002] , [Hori T. et al., 2003]

Word scoring

with dependency structure



[Koumpis & Renals, 2004]


[He et al., 1999]

User access information

[Zechner, 2001]

Removing disfluencies

[Hori T. et al., 2003]

Weighted finite state


content context sentence level extraction for speech summary christensen et al 2004
Content/Context sentence level extraction for speech summary [Christensen et al., 2004]
  • Find sentences similar to the lead topic sentences
  • Use position features to find the relevant nearby sentences after detecting a topic sentence
    • where Sim is a similarity measure between two sentences or a sentence and a document (D) and E is the set of sentences already in the summary
    • Choose a new sentence which is most like D and most different from E
weighted finite state transducers for speech summarization hori t et al 2003
Weighted finite state transducers for speech summarization [Hori T. et al., 2003]
  • Summarization includes speech recognition, paraphrasing, sentence compaction integrated into single Weighted Finite State Transducer
  • Decoder can use all knowledge sources in one-pass strategy
  • Speech recognition using WFST
    • Where H is state network of triphone HMMs, C is triphone connection rules, L is pronunciation and G is trigram language model
  • Paraphrasing can be looked at as a kind of machine translation with translation probability P(W|T) where W is source language and T is the target language
  • If S is the WFST representing translation rules and D is the language model of the target language speech summarization can be looked at as the following composition

Speech Translator







Speech recognizer


user access identifies what to include he et al 1999
User Access Identifies What to Include [He et al., 1999]
  • Summarize lectures or shows by extracting parts that have been viewed the longest
  • Needs multiple users of the same show, meeting or lecture for training
  • E.g. To summarize lectures compute the time spent on each slide
  • Summarizer based on user access logs did as well as summarizers that used linguistic and acoustic features
    • Average score of 4.5 on a scale of 1 to 8 for the summarizer (subjective evaluation)
word level extraction by scoring classifying words hori c et al 1999 2002
Word level extraction by scoring/classifying words [Hori C. et al., 1999, 2002]
  • Score each word in the sentence and extract a set of words to form a sentence whose total score is the product/sum of the scores of each word
  • Example:
    • Word Significance score (topic words)
    • Linguistic Score (bigram probability)
    • Confidence Score (from ASR)
    • Word Concatenation Score (dependency structure grammar)

Where M is the number of words to be extracted, and I C T are weighting factors for balancing among L, I, C, and T r

segmentation using discourse cues maybury 1998
Segmentation Using Discourse Cues [Maybury, 1998]
  • Discourse Cue-Based Story Segmentation
  • Discourse Cues in CNN
    • Start and end of broadcast
    • Anchor/Reporter handoff, Reporter/Anchor handoff
    • Cataphoric Segment (“still ahead …”)
  • Time Enhanced Finite State Machine representing discourse states such as anchor segment, reporter segment, advertisement
  • Other features: named entities, part of speech, discourse shifts “>>” speaker change, “>>>” subject change
cu summarization without words does importance of what is said correlates with how it is said
CU: Summarization without Words: Does importance of ‘what’ is said correlates with ‘how’ it is said?
  • Hypothesis: “Speakers change their amplitude, pitch, speaking rate to signify importance of words, phrases, sentences.”
    • If so, then the prediction labels for sentences predicted using acoustic features (A) should correlate with labels predicted using lexical features (L)
    • In fact, this seems to be true (corr .74 between precitions of A and L
is it possible to build good automatic speech summarization without any transcripts
Is It Possible to Build ‘good’ Automatic Speech Summarization Without Any Transcripts?
  • Just using A+S without any lexical features we get 6% higher F-measure and 18% higher ROUGE-avg than the baseline
evaluation using rouge
Evaluation using ROUGE
  • F-measure too strict
    • Predicted summary sentences must match summary sentences exactly
    • What if content is similar but not identical?
  • ROUGE(s)…
rouge metric
ROUGE metric
  • Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
  • ROUGE-N (where N=1,2,3,4 grams)
  • ROUGE-L (longest common subsequence)
  • ROUGE-S (skip bigram)
  • ROUGE-SU (skip bigram counting unigrams as well)
  • Does ROUGE solve the problem?
next class
Next Class
  • Emotional speech
  • HW 4 assigned