User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression Liwei He and Anoop Gupta Microsoft Research

Introduction • Time compression: key to browse AV content • We focus on informational content • Audio time compression algorithms • Linear: speed up audio uniformly • Non-linear: exploit fine-grain structure of human speech (e.g. pause, phonemes) • How much more do users gain from more complex algorithms?

Methodology • Conduct user listening test • One Linear TC algorithm • Two Non-linear TC algorithms • Simple: Pause-removal followed by Linear TC • Sophisticated: Adaptive TC • Compare objective and subjective measurements

Time Compression Algorithms

Linear Time Compression • Classic algorithms • Overlap Add (OLA) and Synchronized OLA (SOLA) • We use SOLA

Non-Linear Time Compression • Algorithm 1: Pause removal plus TC • Energy and Zero Crossing Rate analysis • Leave 150ms untouched • Shorten >150ms to 150ms • Apply SOLA algorithm • PR shortens speech by 10-25%

Non-Linear Time Compression (cont.) • Algorithm 2: Adaptive TC • Mimics people when talking fast • Pauses and silences are compressed the most • Stressed vowels are compressed the least • Consonants are compressed more than vowels • Consonants are compressed based on neighboring vowels

System Implications • Computational complexity • Adaptive TC 10x more costly than Linear TC • Complexity in client-server implementation • Buffer management required for non-linear TC • Audio-video synchronization quality

User Study Method

User Study Goals • Highest intelligible speed • Comprehension • Subjective preference • Sustainable speed

Experiment Method • 24 subjects • 4 tasks for each subject • 3 time compression algorithms • Linear TC using SOLA (Linear) • Pause removal plus Linear TC (PR-Lin) • Adaptive TC (Adapt) • Each test takes approximately 30 minutes

Highest Intelligible Speed Task • 3 clips from technical talks • Find the highest speed when most of words are understandable

Comprehension Task • 3 clips at 1.5x and 3 clips at 2.5x • Clips from TOEFL listening test • Answer 4 multiple choice questions

Subjective Preference Task • 3 pairs of clips at 1.5x • 3 pairs of clips at 2.5x • Each pair contains the same clip compressed with 2 of the 3 TC algorithms • Indicate preference on 3-point scale

Sustainable Speed Task • 3 clips each 8 minute along • Clips from a CD audio book • Find the maximum comfortable speed • Write a 4-5 sentence summary at the end

User Study Results

Highest Intelligible Speed Task • PR-Lin is significantly better than Adapt (p<.01)

Comprehension Task Adapt is better than PR-Lin (p=.083) at 2.5x

Preference Task at 1.5x • Slight preference for PR-Lin (p=.093)

Preference Task at 2.5x • PR-Lin and Adapt do significantly better than Linear

Sustainable Speed Task

Conclusions

Previous Works • Mach1 (Covell et. al. ICASSP 98) • Comprehension and preference tasks • Comparing Linear and Mach1 (Adapt) at 2.6-4.2x • Comprehension scores 17% better w/ Mach1 • 95% prefers Mach1 to Linear • No data on < 2.0x • Other works (Harrigan, Omoigui, Li, Foulke) • 1.2-1.7x is the sustainable listening speed

Conclusions • Trade off in TC algorithms is task-related • Listening: Linear TC is sufficient • Fast Forwarding: Non-linear TC is more suitable • Adapt TC is close to the way people talk fast • Limit lies in the human-listening and comprehension

User Benefits of Non-Linear Time Compression