1 / 24

User Benefits of Non-Linear Time Compression

User Benefits of Non-Linear Time Compression. Liwei He and Anoop Gupta Microsoft Research. Introduction. Time compression: key to browse AV content We focus on informational content Audio time compression algorithms Linear: speed up audio uniformly

lowell
Download Presentation

User Benefits of Non-Linear Time Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User Benefits of Non-Linear Time Compression Liwei He and Anoop Gupta Microsoft Research

  2. Introduction • Time compression: key to browse AV content • We focus on informational content • Audio time compression algorithms • Linear: speed up audio uniformly • Non-linear: exploit fine-grain structure of human speech (e.g. pause, phonemes) • How much more do users gain from more complex algorithms?

  3. Methodology • Conduct user listening test • One Linear TC algorithm • Two Non-linear TC algorithms • Simple: Pause-removal followed by Linear TC • Sophisticated: Adaptive TC • Compare objective and subjective measurements

  4. Time Compression Algorithms

  5. Linear Time Compression • Classic algorithms • Overlap Add (OLA) and Synchronized OLA (SOLA) • We use SOLA

  6. Non-Linear Time Compression • Algorithm 1: Pause removal plus TC • Energy and Zero Crossing Rate analysis • Leave 150ms untouched • Shorten >150ms to 150ms • Apply SOLA algorithm • PR shortens speech by 10-25%

  7. Non-Linear Time Compression (cont.) • Algorithm 2: Adaptive TC • Mimics people when talking fast • Pauses and silences are compressed the most • Stressed vowels are compressed the least • Consonants are compressed more than vowels • Consonants are compressed based on neighboring vowels

  8. System Implications • Computational complexity • Adaptive TC 10x more costly than Linear TC • Complexity in client-server implementation • Buffer management required for non-linear TC • Audio-video synchronization quality

  9. User Study Method

  10. User Study Goals • Highest intelligible speed • Comprehension • Subjective preference • Sustainable speed

  11. Experiment Method • 24 subjects • 4 tasks for each subject • 3 time compression algorithms • Linear TC using SOLA (Linear) • Pause removal plus Linear TC (PR-Lin) • Adaptive TC (Adapt) • Each test takes approximately 30 minutes

  12. Highest Intelligible Speed Task • 3 clips from technical talks • Find the highest speed when most of words are understandable

  13. Comprehension Task • 3 clips at 1.5x and 3 clips at 2.5x • Clips from TOEFL listening test • Answer 4 multiple choice questions

  14. Subjective Preference Task • 3 pairs of clips at 1.5x • 3 pairs of clips at 2.5x • Each pair contains the same clip compressed with 2 of the 3 TC algorithms • Indicate preference on 3-point scale

  15. Sustainable Speed Task • 3 clips each 8 minute along • Clips from a CD audio book • Find the maximum comfortable speed • Write a 4-5 sentence summary at the end

  16. User Study Results

  17. Highest Intelligible Speed Task • PR-Lin is significantly better than Adapt (p<.01)

  18. Comprehension Task Adapt is better than PR-Lin (p=.083) at 2.5x

  19. Preference Task at 1.5x • Slight preference for PR-Lin (p=.093)

  20. Preference Task at 2.5x • PR-Lin and Adapt do significantly better than Linear

  21. Sustainable Speed Task

  22. Conclusions

  23. Previous Works • Mach1 (Covell et. al. ICASSP 98) • Comprehension and preference tasks • Comparing Linear and Mach1 (Adapt) at 2.6-4.2x • Comprehension scores 17% better w/ Mach1 • 95% prefers Mach1 to Linear • No data on < 2.0x • Other works (Harrigan, Omoigui, Li, Foulke) • 1.2-1.7x is the sustainable listening speed

  24. Conclusions • Trade off in TC algorithms is task-related • Listening: Linear TC is sufficient • Fast Forwarding: Non-linear TC is more suitable • Adapt TC is close to the way people talk fast • Limit lies in the human-listening and comprehension

More Related