[Advanced] Speech & Audio Signal Processing

[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

State of the Art in Speech/Audio • Speech and audio processing may be divided into “low-level” and “high-level” inference • Speech enhancement, compression, and coding are all widely used technologies • This low-level work is the most mature • High-level tasks will drive future advances • Speech/music database information retrieval • Automatic speaker and speech recognition • But low-level issues also remain…

How to obtain highly structured representations of speech and audio signals? Time frequency “atoms” as building blocks How can statistical inference enable advances in speech signal processing? A means to obtain an “atomic decomposition” Statistical modeling of time-frequency coefficients provides a principled solution Fundamental Questions

Missing data in the context of VOIP: Original Missing Restored Source / Speaker Separation Source 1 Source 2 Mixture 1 Mixture 2 Recovery 1 Recovery 2 Representative Applications

Digital Speech/Audio Processing

Speech Production

Time-Scale Modification

Male & Female Speaker Original Fast Faster Slower Trumpet Original Fast Slow Time-Scale Modification • Speech and Quasi-Periodic Audio • Sinewave-based Modification • Voicing-dependent Rate Factor

Falling Can, Bongo Drums, Loon Original Slow More Time-Scale Modification • Complex Non-Speech Signals • Phase-Vocoder-based Modification • Event-Dependent Phase Coherence

Male & Female Speaker Original Low pitch/Long vocal tract High pitch/Short vocal tract Male Speaker Original and Monotone Pitch and Vocal Tract Change • Sinewave-based Modification

Female Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps Speech Coding • Sinewave-based • Code-Excited Linear Prediction • Male Speaker • Original • CELP 8000 bps • Sine 4800 bps • Sine 2400 bps

Cell Phone Noise, Cocktail Party, Automobile Noise Original Enhanced Noise Reduction • Adaptive Wiener Filter • Adaptation Based on Spectral Change

Low-noise case Original 1.5 dB Reduction 3.0 dB Reduction Compression • Reduction of Peak-to-RMS amplitude ratio • Based on Sinewave Analysis/Synthesis • High-noise case • Original • 1.5 dB Reduction • 3.0 dB Reduction

[Advanced] Speech & Audio Signal Processing