Multiresolution STFT for Analysis and Processing of Audio

Talk at B.U. Sept. 2010 Multiresolution STFTfor Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA

Short-Time Fourier Transform • Most commonly used transform for audio: • Spectral analysis • Noise reduction (spectral subtraction algorithm) • Time-variable filters and other effects • Very fast implementation for a large number of bands via FFT • Good energy compaction for many musical signals • Many oscillations in basis functions → ringing (Gibbs phenomenon) • Uniform frequency resolution → inadequate resolution at low freqs. + – A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Short-Time Fourier Transform • Spectrogram: displays evolution of spectrum in time A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Problems: • Most perceptually meaningful energy is concentrated in a narrow band below 4 kHz → can’t see enough details • Time/frequency resolution trade-off Conventional STFT spectrogram (linear frequency scale) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Problems: • Poor frequency resolution at low frequencies → can’t separate bass harmonics from the bass drum • Time/frequency resolution trade-off Mel-scale STFT spectrogram (window size = 12 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Problems: • Poor time resolution at transients → time-smearing of drums and other percussive sounds Mel-scale STFT spectrogram (window size = 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Decomposition Processing of subband signals Synthesis f f x[n] y[n] … … t t STFT DWT Filter banks • Idea: • Decompositions of a time-frequency plane Uncertainty principle A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Psychoacoustic model FFT mp3 file Filter bank Q Huffman x[n] Filter banks • Perceptual coding of audio Diagram of anmp3 encoder A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Filter banks • Window size switching (guided by transients detection) Transient Pre-echo Reduced pre-echo A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Proposed approach • Imitation of time-frequency resolution of human hearing • Adaptation of resolution to local signal features Transforms should vary their time-frequency resolution in a perceptually motivated way A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Simple solution: • Combine spectrograms with different resolutions: take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good time resolution Combined resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Simple solution: combine spectrograms with different resolutions • Each spectrogram is computed on the same grid of time-frequency points (using zero padding) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Better approach: select best resolution for each time-frequency neighborhood • Criteria? • Better frequency resolution at bass (reflects a-priori psychoacoustical knowledge) • Maximal energy compaction (to minimize spectral smearing in both time and frequency, i.e. maximize sparsity) best 6 ms 12 ms 24 ms 48 ms 96 ms STFT window size A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Calculation of sparsity(in a given block, for all T/F resolutions r) Here ai,r are STFT magnitudes in the block, Sr is the spectrum sparsity for the given resolution r, r0 is the resolution with best sparsity. best 6 ms 12 ms 24 ms 48 ms 96 ms STFT window sizes A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Benefits: • Sharper bass drum hits and other transients, even in mid-frequency range • Sharper guitar harmonics at high frequencies Adaptive resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms • Simple solution: • Combine spectrograms with different resolutions: take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good time resolution Combined resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms More examples Conventional STFT spectrogram Tone onset waveform A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms More examples Adaptive resolution spectrogram Combined resolution spectrogram A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Processing framework • General framework for multi-resolution processing • Perform processing with several different resolutions • Adaptively combine (mix) results in a time-frequency space • Mixing is controlled by a-priori knowledge of psychoacoustics and analysis of local signal features (e.g. transience or sparsity) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Noise spectrum estimation W[f] x[t] X[f,t] S[f,t] s[t] STFT Inverse STFT – Noise reduction • Spectral subtraction algorithm • STFT of a noisy signal • Estimate power spectrum of noise (manually or automatically) • Subtract noise power spectrum from a signal power spectrum • InverseSTFT A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Transience analysis control Spectral subtraction (short windows) x1[t] STFT y[t] Mixer of coefficients x3[t] Synthesis Spectral subtraction (long windows) x2[t] STFT Noise reduction • Example of adaptive resolution • Better frequency resolution at low frequencies (according to the resolution of human hearing) • Better temporal resolution near signal transients (for reduction of Gibbs phenomenon) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Noise reduction • Results of single-resolution and multi-resolution algorithms Noisy recording (guitar + castanets) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Noise reduction • Results of single-resolution and multi-resolution algorithms Single resolution Multi-resolution (notice less pre-ringing on transients) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Conclusion • Maximize sparsity (spactrogram sharpness) • Account for human perception When using STFT –do care about the window size! Choose the size wisely: A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Your questions ? Demo web page: http://www.izotope.com/tech/aes_adapt/ A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Multiresolution STFT for Analysis and Processing of Audio