1 / 13

[Advanced] Speech & Audio Signal Processing

[Advanced] Speech & Audio Signal Processing . ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006. State of the Art in Speech/Audio. Speech and audio processing may be divided into “low-level” and “high-level” inference

axelle
Download Presentation

[Advanced] Speech & Audio Signal Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

  2. State of the Art in Speech/Audio • Speech and audio processing may be divided into “low-level” and “high-level” inference • Speech enhancement, compression, and coding are all widely used technologies • This low-level work is the most mature • High-level tasks will drive future advances • Speech/music database information retrieval • Automatic speaker and speech recognition • But low-level issues also remain…

  3. How to obtain highly structured representations of speech and audio signals? Time frequency “atoms” as building blocks How can statistical inference enable advances in speech signal processing? A means to obtain an “atomic decomposition” Statistical modeling of time-frequency coefficients provides a principled solution Fundamental Questions

  4. Missing data in the context of VOIP: Original Missing Restored Source / Speaker Separation Source 1 Source 2 Mixture 1 Mixture 2 Recovery 1 Recovery 2 Representative Applications

  5. Digital Speech/Audio Processing

  6. Speech Production

  7. Time-Scale Modification

  8. Male & Female Speaker Original Fast Faster Slower Trumpet Original Fast Slow Time-Scale Modification • Speech and Quasi-Periodic Audio • Sinewave-based Modification • Voicing-dependent Rate Factor

  9. Falling Can, Bongo Drums, Loon Original Slow More Time-Scale Modification • Complex Non-Speech Signals • Phase-Vocoder-based Modification • Event-Dependent Phase Coherence

  10. Male & Female Speaker Original Low pitch/Long vocal tract High pitch/Short vocal tract Male Speaker Original and Monotone Pitch and Vocal Tract Change • Sinewave-based Modification

  11. Female Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps Speech Coding • Sinewave-based • Code-Excited Linear Prediction • Male Speaker • Original • CELP 8000 bps • Sine 4800 bps • Sine 2400 bps

  12. Cell Phone Noise, Cocktail Party, Automobile Noise Original Enhanced Noise Reduction • Adaptive Wiener Filter • Adaptation Based on Spectral Change

  13. Low-noise case Original 1.5 dB Reduction 3.0 dB Reduction Compression • Reduction of Peak-to-RMS amplitude ratio • Based on Sinewave Analysis/Synthesis • High-noise case • Original • 1.5 dB Reduction • 3.0 dB Reduction

More Related