Atomatic summarization of voicemail messages using lexical and prosodic features

Atomatic summarization of voicemail messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev

The Domain • Voicemail is a special case of spontaneous speech • Goal: Enable the voicemail user to receive his/her messages anywhere and any time, in particular on mobile devices • Key components: caller identification, reason for call, information that the caller requires and a return phone number

The Task • Summarization: obtain the most important information from the voicemail • A complete system requires spoken language understanding and language generation  current technology is not adequate • Solution: simplify task by deciding for each word if it will be in the summary

The Task • Voicemail is short, 40s on average • Summaries must fit into 140 characters (mobile devices) • Content more important than coherence and document flow • ASR used on the voicemail, a significant word error rate must be assumed (30%-40% error!)

Voicemail Corpus • IBM Voicemail Corpus-Part I • 1801 messages (1601 for training, 200 for testing) • 14.6 h • On average 90 words / message • Message topics: 27% business related, 25% personal, 17% work-related, 13% technical and 18% in other categories.

The classification problem • Classifier decides if a word is included in the summary • Using Parcel (feature selection alg.) and a Receiver Operating Characteristic (ROC) graphs for feature selection • Hybrid multi-layer perceptron (MLP) / hidden Markov model (HMM) classifier

Receiver Operating Characteristic Graph plots the true positive rate (sensitivity) vs. 1 - the true negative rate (specificity) We can shift the positive vs. negative error by taking different acceptance thresholds (we move on the ROC curve) Different classifiers will have different ROC curves

Sample ROC graph

System setup • The team built a sophisticated, multi-component system that can capture the different types of information occurring in voicemail • Initial trigram language model, augmented with sentences from the Hub-4 Broadcast News and Switchboard language model training corpora

System Setup • Pronunciation dictionary with 10,000 words from the training data • + pronunciations obtained from the SPRACH broadcast news system • Annotated summary words in 1,000 messages

System overview

Entities in summaries

Annotation procedures • 1. Pre-annotated NEs were marked as targets, unless unmarked by later rules; • 2. The first occurrences of the names of the speaker and recipient were always marked • as targets; later repetitions were unmarked unless they resolved ambiguities; • 3. Any words that explicitly determined the reason for calling including important • dates/times and action items were marked; • 4. Words in a stopword list with 54 entries were unmarked;

Annotation procedures • Labeled only on transcribed messages (no audio) • Annotators tended to eliminate irrelevant words (as opposed to mark content words) • Produced summaries about 30% shorter than original message • Relatively good level of inter-annotator agreement

Lexical features • Lexical features from ASR outputcollection frequency (less frequent words more informative)acoustic confidence (ASR confidence)All other features considered before and after stemming:NEs, proper names, tel. numbers, dates and times, word position

Prosodic features • Prosodic features from audio using signal processing algorithmsduration normalization over the corpuspauses (preceding and succeeding)mean energyF0 range, average, onset and offset

Results • Named entities identified very accurately (without stemming) • Telephone numbers recognized well also by specific named entity lists. Pos also good as numbers appear towards the end of the messageAll prosodic features but duration had no predictive power

ROC curves for different tasks / features

Results • Dates / times: best matched by specific named entity list and collection frequencyProsodic features (duration, energy, F0 range) more important but still not the best predictors

The Parcel bootstrappingalgorithm

Conclusions • Trade-off between length of summary and retaining essential content words • 70%-80% agreement with human summary for hand-annotated messages • 50% agreement when using ASR

Conclusions • Automatic summaries perceived as worse than human summaries (duh!) • However, if the summarizer used human annotated data (as opposed to ASR output), the perceived quality improved significantly

Atomatic summarization of voicemail messages using lexical and prosodic features

Atomatic summarization of voicemail messages using lexical and prosodic features

Presentation Transcript

Automatic Identification and Classification of Words using Phonetic and Prosodic Features

BioChain : Using Lexical Chaining Approaches for Biomedical Text Summarization

Acoustic/Prosodic and Lexical Correlates of Charismatic Speech

Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization

Acoustic/Prosodic and Lexical Correlates of Charismatic Speech

Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis

Prosodic features of European students of English during PowerPoint presentations

Rich lexical representations and conflicting features

Prosodic and Phonetic Features for Speaking Styles Classification and Detection

VOICEMAIL

Mandarin Tone Recognition using Affine-Invariant Prosodic Features and Tone Posteriorgram

Design of lexical analyzer using LEX

Lexical chains for summarization

Acoustic/Prosodic Features

Reactive Tokens and the Prosodic Features of Turn Unit Boundary in Korean

Extractive Spoken Document Summarization - Models and Features

Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight

Summarization of Broadcast News using Speaker Tracking

Prosodic/Suprasegmental Features (Part of Paralanguage)

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Using Your Voicemail

Novel Algorithm for Multi document Summarization using Lexical Concept