1 / 25

Multiresolution STFT for Analysis and Processing of Audio

Talk at B.U. Sept. 2010. Multiresolution STFT for Analysis and Processing of Audio. Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA. Short-Time Fourier Transform. Most commonly used transform for audio: Spectral analysis

Download Presentation

Multiresolution STFT for Analysis and Processing of Audio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Talk at B.U. Sept. 2010 Multiresolution STFTfor Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA

  2. Short-Time Fourier Transform • Most commonly used transform for audio: • Spectral analysis • Noise reduction (spectral subtraction algorithm) • Time-variable filters and other effects • Very fast implementation for a large number of bands via FFT • Good energy compaction for many musical signals • Many oscillations in basis functions → ringing (Gibbs phenomenon) • Uniform frequency resolution → inadequate resolution at low freqs. + – A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  3. Short-Time Fourier Transform • Spectrogram: displays evolution of spectrum in time A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  4. Spectrograms • Problems: • Most perceptually meaningful energy is concentrated in a narrow band below 4 kHz → can’t see enough details • Time/frequency resolution trade-off Conventional STFT spectrogram (linear frequency scale) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  5. Spectrograms • Problems: • Poor frequency resolution at low frequencies → can’t separate bass harmonics from the bass drum • Time/frequency resolution trade-off Mel-scale STFT spectrogram (window size = 12 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  6. Spectrograms • Problems: • Poor time resolution at transients → time-smearing of drums and other percussive sounds Mel-scale STFT spectrogram (window size = 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  7. Decomposition Processing of subband signals Synthesis f f x[n] y[n] … … t t STFT DWT Filter banks • Idea: • Decompositions of a time-frequency plane Uncertainty principle A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  8. Psychoacoustic model FFT mp3 file Filter bank Q Huffman x[n] Filter banks • Perceptual coding of audio Diagram of anmp3 encoder A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  9. Filter banks • Window size switching (guided by transients detection) Transient Pre-echo Reduced pre-echo A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  10. Proposed approach • Imitation of time-frequency resolution of human hearing • Adaptation of resolution to local signal features Transforms should vary their time-frequency resolution in a perceptually motivated way A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  11. Spectrograms • Simple solution: • Combine spectrograms with different resolutions: take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good time resolution Combined resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  12. Spectrograms • Simple solution: combine spectrograms with different resolutions • Each spectrogram is computed on the same grid of time-frequency points (using zero padding) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  13. Spectrograms • Better approach: select best resolution for each time-frequency neighborhood • Criteria? • Better frequency resolution at bass (reflects a-priori psychoacoustical knowledge) • Maximal energy compaction (to minimize spectral smearing in both time and frequency, i.e. maximize sparsity) best 6 ms 12 ms 24 ms 48 ms 96 ms STFT window size A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  14. Spectrograms • Calculation of sparsity(in a given block, for all T/F resolutions r) Here ai,r are STFT magnitudes in the block, Sr is the spectrum sparsity for the given resolution r, r0 is the resolution with best sparsity. best 6 ms 12 ms 24 ms 48 ms 96 ms STFT window sizes A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  15. Spectrograms • Benefits: • Sharper bass drum hits and other transients, even in mid-frequency range • Sharper guitar harmonics at high frequencies Adaptive resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  16. Spectrograms • Simple solution: • Combine spectrograms with different resolutions: take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good time resolution Combined resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  17. Spectrograms More examples Conventional STFT spectrogram Tone onset waveform A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  18. Spectrograms More examples Adaptive resolution spectrogram Combined resolution spectrogram A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  19. Processing framework • General framework for multi-resolution processing • Perform processing with several different resolutions • Adaptively combine (mix) results in a time-frequency space • Mixing is controlled by a-priori knowledge of psychoacoustics and analysis of local signal features (e.g. transience or sparsity) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  20. Noise spectrum estimation W[f] x[t] X[f,t] S[f,t] s[t] STFT Inverse STFT – Noise reduction • Spectral subtraction algorithm • STFT of a noisy signal • Estimate power spectrum of noise (manually or automatically) • Subtract noise power spectrum from a signal power spectrum • InverseSTFT A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  21. Transience analysis control Spectral subtraction (short windows) x1[t] STFT y[t] Mixer of coefficients x3[t] Synthesis Spectral subtraction (long windows) x2[t] STFT Noise reduction • Example of adaptive resolution • Better frequency resolution at low frequencies (according to the resolution of human hearing) • Better temporal resolution near signal transients (for reduction of Gibbs phenomenon) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  22. Noise reduction • Results of single-resolution and multi-resolution algorithms Noisy recording (guitar + castanets) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  23. Noise reduction • Results of single-resolution and multi-resolution algorithms Single resolution Multi-resolution (notice less pre-ringing on transients) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  24. Conclusion • Maximize sparsity (spactrogram sharpness) • Account for human perception When using STFT –do care about the window size! Choose the size wisely: A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

  25. Your questions ? Demo web page: http://www.izotope.com/tech/aes_adapt/ A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

More Related