1 / 20

Speech Enhancement

Speech Enhancement. Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method. Process Flow. Segmenting of Signal. The sample is divided into frames whose length is equal to 25ms with a shift percentage of 40% or 10ms.

komala
Download Presentation

Speech Enhancement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method

  2. Process Flow

  3. Segmenting of Signal • The sample is divided into frames whose length is equal to 25ms with a shift percentage of 40% or 10ms. • The Window Length is equal to the 25ms times the Sampling Frequency. • Example • Sampling Frequency is equal to 8000 samples/s • Window Length = 0.025s * 8000 samples/s = 200 samples • Each frame is then windowed using a Hamming window.

  4. Initial Silence Segments • The initial silence or speech inactivity period is assumed to be 250ms. • This is to allow for a sufficient amount of data to be analyzed for the Noise Spectrum prior to attempting Voice Activity Detection (VAD). • The Number of Initial Silence Segments (NISS) = (Initial Silence * Sampling Frequency - Window Length)/(Shift Percentage* Window Length). • Example • Using our previous values. • NISS = (0.25s * 8000 samples/s - 200 samples)/0.4*200 samples = 22.5. • The value is rounded down to the nearest whole number

  5. Phase Calculation using FFT • The Fast Fourier Transform of each frame is calculated. • The phase component of the FFT is calculated for use in reconstruction of the enhanced signal.

  6. Noise Power Spectrum • An initial Noise Power Spectrum and the Noise Power Spectrum Variance (λd) is calculated using the mean values of the FFT for the NISS. • For each frame in the NISS, the Noise Power Spectrum and the Noise Power Spectrum Variance are updated. • The frames after the NISS are evaluated using a Voice Activity Detector (VAD) which utilizes the Noise Power Spectrum. • If the frames are determined to contain only noise, then the Noise Power Spectrum and the Noise Power Spectrum Variance are updated.

  7. Signal to Noise Ratio • Using the Noise Power Spectrum , the a priori SNR (ξk) and the a posteriori SNR (γk) are calculated. • a priori SNR: • γk=Rk2/λd(k) • where Rk is the modulus of the signal plus noise resultant spectral component • a posteriori SNR • ξk(n)=αG2γk(n-1)+(1- α)P [γk(n)-1] • where α = 0.99 and is a smoothing factor. • and G is the Gain Function from the MMSE • and P[x] is defined as x if x>0 or 0 otherwise

  8. Gain Calculation • The gain (G) of the signal is then updated using the Signal to Noise Ratios. • G= ξk/(1- ξk)e(η/2) • Where η= λdξk/(1- ξk)

  9. Signal Enhancement and Reconstruction • The signal is then cleaned by combining the FFT of each frame with the gain. • The signal is reconstructed using the overlap add method utilizing the phase of the FFT.

  10. Sample – Hair Dryer Background

  11. Sample – Jack Hammer Background

  12. Sample – Air Conditioner Background

  13. Sample – Cafeteria Background

  14. Sample – Automobile Background

  15. Sample – Coffee Grinder Background

  16. Sample – Fan Background

  17. Sample – Feedback Background

  18. Sample – White Noise Background

  19. Sample – Static Background

  20. References • Ephraim, Yariv. “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 32, No. 6, December 1984

More Related