Convex Optimization in Sinusoidal Modeling for Audio Signal Processing

Convex Optimization in Sinusoidal Modeling forAudio Signal Processing Michelle Daniels PhD Student, University of California, San Diego

Outline • Introduction to sinusoidal modeling • Existing approach • Proposed optimization post-processing • Testing and results • Conclusions • Future work

Analysis of Audio Signals • Audio signals have rapid variations • Speech • Music • Environmental sounds • Assume minimal change over short segments (frames) • Analyze on a frame-by-frame basis • Constant-length frames (46ms) • Frames typically overlap • Any audio signal can be represented as a sum of sinusoids (deterministic components) and noise (stochastic components)

Sinusoidal Modeling of Audio Signals • Given a signal y of length N, represent as Kcomponent sinusoids plus noise e: • y and e are N-dimensional vectors • Each sinusoid has frequency (w), magnitude (a), and phase (f)parameters • K is determined during the analysis process • Higher-resolution frequencies than DFT bins, no harmonic relationship required • Model, encode, and/or process these components independently • Applications: • Effects processing (time-scale modification, pitch shifting) • Audio compression • Feature extraction for machine listening • Auditory scene analysis

Estimation Algorithm • Using frequency domain analysis (e.g. FFT), iterate up to K times, until residual signal is small and/or has a flat spectrum: • Identify the highest-magnitude sinusoid in the signal • Estimate its frequency w • Given w, estimate its magnitude a and phase f • Reconstruct the sinusoid • Subtract the reconstructed sinusoid to produce a residual signal • After all sinusoids have been removed, the final residual contains only noise

Sinusoidal Analysis Example

Estimation Challenges • Energy in any DFT bin can come from: • Multiple sinusoids with similar frequency • Both sinusoids and noise • Interference from other sinusoids and/or noise results in inaccurate estimates • Incorrect estimation of a single sinusoid corrupts the residual signal and affects all subsequent estimates

Possible Solution • Optimize frequency, magnitude, and phase to minimize the energy in the residual signal • The original parameter estimates are initial estimates for the optimization • Sinusoidal approximation: • Residual: • Optimization problem:

Is it Convex? • Want convexity so the problem is practical to solve • Not a convex optimization problem because each element of ŷ is a sum of cosine functions of w and f • Want convex function inside of the 2-norm instead • With fixed frequencies, can reformulate optimization of magnitudes and phases as convex problem • Fix frequencies to initial estimates

Convex Optimization Problem Classic least-squares problem: Magnitude and phase recovered as:

Related Work • PetreStoica, Hongbin Li, and Jian Li. “Amplitude estimation of sinusoidal signals: Survey, new results, and an application”, 2000. • Mentions least-squares as one approach to estimate amplitude of complex exponentials • No discussion of phase estimation • Hing-Cheung So. “On linear least squares approach for phase estimation of real sinusoidal signals”, 2005. • Focuses on phase estimation • Theoretical analysis • Not applied specifically to audio signals

Constraints • Analytic least-squares solution frequently results in unrealistic magnitude values • This is possibly the result of errors in frequency estimates • Constraints on magnitudes were required • Ideal constraint: • Relaxed constraint: • Result is a constrained least squares problem that can be solved using a generic quadratic program (QP) solver

Final Formulation • Quadratic Program: • Magnitude and phase recovered from x as:

Test Signals • Model test signals that reproduce challenging aspects of real-world signals • Reconstruct signal based on original model parameters and optimized parameters • Compare both reconstructions to original test signal and to each other

Test Signal 1: Overlapping Sinusoids • Signal consists of two sinusoids close in frequency • There is no additive noise, so the residual (the noise component of the model) should be zero

Results 1: Overlapping Sinusoids • Without optimization, there is significant energy left in the residual (very audible) • With optimization, the residual power at individual frequencies is reduced by as much as 50dB (now barely audible) • The improvement with optimization generally decreases as the frequency separation is increased

Test Signal 2: Sudden Onset • A single sinusoid starts half-way through an analysis frame (the first half is silence)

Results 2: Sudden Onset Original: MSE* = 2.76x10-5 Optimized:MSE* = 4.13x10-6 *MSE = Mean Squared Error

Test Signal 3: Chirp • A single sinusoid with constant magnitude and continuously-increasing frequency

Results 3: Chirp • Non-optimized peak magnitudes are close to constant between consecutive frames • Optimized peak magnitudes vary significantly from frame to frame • The optimization produces peak parameters that do not reflect the underlying real-world phenomenon.

Conclusions • Problem can be formulated using convex programming • For several classic challenging signals, optimization produces a more accurate model • Constraints are necessary to ensure parameter estimates reflect possible real-world phenomena • Final formulation is quadratic program • Parameters obtained via optimization may still not represent the underlying real-world phenomenon as well as the original analysis (i.e. chirp)

Future Work • Explore robust optimization techniques to compensate for errors in frequency estimates • Integrate optimization into original analysis instead of a post-processing stage • Experiment with more real-world signals • Further investigate constraints • The ultimate goal: three-way joint optimization of frequency, magnitude, and phase

References • M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21. http://cvxr.com/cvx, May 2010. • R. McAulay and T. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4):744-754, Aug 1986. • Xavier Serra. A System for Sound Analysis/Transformation/Synthesis Based on a Deterministic Plus Stochastic Decomposition. PhD thesis, Stanford University, 1989. • Kevin M. Short and Ricardo A. Garcia. Accurate low-frequency magnitude and phase estimation in the presence of DC and near-DC aliasing. In Proceedings of the 121st Convention of the Audio Engineering Society, 2006. • Kevin M. Short and Ricardo A. Garcia. Signal analysis using the complex spectral phase evolution (CSPE) method. In Proceedings of the 120th Convention of the Audio Engineering Society, 2006. • Hing-Cheung So. On linear least squares approach for phase estimation of real sinusoidal signals. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E88-A(12):3654-3657, December 2005. • PetreStoica, Hongbin Li, and Jian Li. Amplitude estimation of sinusoidal signals: Survey, new results, and an application. IEEE Transactions on Signal Processing, 48(2):338-352, 2000.

Thanks for your attention! For further information: http://ccrma.stanford.edu/~danielsm/ifors2011.html

THE END

Convex Reformulation Define: Change of variables: Define:

Test Signal: Sinusoid in noise • A single sinusoid with stationary frequency and corrupted by additive white Gaussian noise • Noise is present at all frequencies, including that of the sinusoid, corrupting magnitude and phase estimates • Test repeated using different variances for the noise (varying signal-to-noise ratios)

Results: Sinusoid in noise • Without optimization, the sinusoid’s magnitude is over-estimated and the noise’s energy is under-estimated • The optimization gives residual energy slightly closer to the true noise energy.

Results: Overlapping Sinusoids The optimization is able to compensate for some of the errors in initial magnitude and phase estimation, resulting in a lower MSE.

Convex Optimization in Sinusoidal Modeling for Audio Signal Processing