ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia [Wed, 25th Aug, R. 201 Speech processing & communication systems 2, 15.40] Enhancement of Electrolaryngeal Speech by Spectral Subtraction, Spectral Compensation, and Introduction of Jitter and Shimmer Prem C. Pandey S. Khadar Basha {pcpandey, basha}ee.iitb.ac.in http://www.ee.iitb.ac.in/~spilab IIT Bombay, India

OVERVIEW1. Introduction2. Spectral subtraction3. Estimation of noise spectrum4. Jitter, shimmer, & spectral compensation5. Results 6. Conclusion

1 INTRODUCTION Glottal excitation to vocal tract Intro. 1/4 Natural speech

Intro. 2/4 Electrolaryngeal speech Excitation to vocal tract from external vibrator

Intro. 3/4 Problems with electrolarynx • Dynamic control of level, voicing, & pitch not feasible • Background noise due to leakage of acoustic energy, affecting the intelligibility • Unnatural quality due to ▪ Low frequency spectral deficit ▪ Constant pitch & level

Intro. 4/4 Methods of noise reduction • Acoustic shielding of vibrator(Epsy-Wilson et al1996) • 2-input noise cancellation based on LMS algorithm ( Epsy-Wilson et al1996) • Single input noise cancellation using spectral subtraction ▪ Averaging based noise est. & pitch-synch. generalized spectral subtraction (Pandey et al 2002) ▪ Quantile based noise estimation (Pandey et al 2004) ▪ Parameter adaptation using freq.domain auditory masking (Liu et al 2006) ▪ Min.statistics based noise estimation (Mitra & Pandey 2006, Kabir et al 2008)

Spec. sub 1/5 2 SPECTRAL SUBTRACION Noise generation (Pandey et al 2002) • Leakage of vibrations produced by vibrator membrane • Improper coupling of vibrations to the neck tissue

Spec. sub 2/5 s(n) = e(n)*hv(n), l(n) = e(n)*hl(n), x(n) = s(n) + l(n) Xn(ej) = En(ej) [Hvn(ej) + Hln(ej)] • Assuming hv(n) & hl(n) to be uncorrelated Xn(ej)2 = En(ej)2[Hvn(ej)2 + Hln(ej)2] • For short-time spectra calculated using pitch-synchronous window, En(ej)2 may be considered as constant E(ej)2 • During non-speech intervals, s(n) will be negligible, Xn(ej)2 =|Ln(ej)|2 = |E(ej)|2 |Hln(ej)|2

Spec. sub 3/5 Generalized spectral subtraction(Berouti et al 1979) using FFT E(k)= | Xn(k)|γ - α|Ln(k)|γ Clean mag. spectrum |Y n(k)|= [E(k)](1/ γ), if E(k) > [β|Ln(k)|]γ β|Ln(k)|, otherwise ( : subtraction,  : spectral floor,  : subtraction power) yn(m) = IDFT [ Yn(k) ejθn(k)]

Spec. sub 4/5 Phase estimation ▪ Noisy phase : θn(k) =Xn(k) ▪ Zero Phase : θn(k) = 0 ▪ Random phase: θn(k) = r (uniformly distr over [0, 2π] ▪ Min. phase calculation iterative tech. (Quatieri and Oppenheim 1981), cepstrum based non-iterative calculation (Oppenheim & Schafer 1975, Rabiner & Schafer 1978, Yegnanarayana & Dhayalan 1981) ▪ Phase set for continuity across the frames θn(k) = θn-1(k) + (2πndk)/N where nd = window shift, N = FFT size ▪ Noisy phase resulted in better quality than others

Spec. sub 5/5 Block diagram of spectral subtraction

Est. noise 1/1 3 ESTIMATION OF NOISE SPECTRUM • Variation in noise due to change in the electrolarynx orientation • Voice activity detection difficult in electrolaryngeal speech • Averaging based noise est. (Pandey et al 2002) unsuitable for long term use • Quantile based noise est. (Stahl et al 2000) used for electrol. speech (Pandey et al 2004) difficult to implement for real-time processing • Minimum statistics based method (Martin 1994) used for elec. lary. speech (Mitra & Pandey 2006, Kabir et al 2008) not effective with fixed subtraction parameters.

Intro. J & S 1/4 4 INTRO. OF JITTER & SHIMMER & SPECTRAL COMPENSATION • Introduction of jitter and shimmer, using LPC based analysis synthesis, after spectral subtraction for reducing unnaturalness • Spectral compensation for low-frequency spectral deficit

Intro. J & S 3/4 Implementation of shimmer Impulse amplitude = a(1+sr1) a = mean amplitude s = peak-to-peak shimmer r1= random number uniformly distributed over +0.5 Implementation of jitter Impulse repetition period = N(1+jr2) N = mean pitch period in number of samples j = peak-to-peak jitter r2 = random number uniformly distributed over +0.5

Intro. J & S 4/4 Spectral compensation • Low frequency spectral deficit in electrolaryngeal speech • High frequency spectral emphasis in resynthesized speech due to impulse train excitation in LPC analysis-synthesis, • Spectral compensation filter designed by comparing LPC smoothened spectra of natural and resynthesized /a/, /i/, /u/. Inserted in the excitation path for spectral compensation.

Results 1/2 Materal: “….Where were you a year ago? 1 2 3 4 5 6 7 8 9 10” Electrolarynx Solatone. • RESULTS γ = 1 Averaging: α=10, β=0.001, Min.: α = 25, β=0.005, Median: α = 1.5, β = 0.001

Results 2/2 Electrolaryngeal speech Enhan. electrolar. speech after spec. sub. with MBNE (α = 1.2, β = 0.001, γ = 1), Material: “…Where were you a year ago? “, Electrolarynx: Solatone

CONCLUSION • ▪ Median based noise estimation could be used for noise suppression without varying the oversubtraction factor. • ▪Phase estimation based on minimum phase and phase continuity did not imrove the quality above that of the noisy speech. ▪Introduction of shimmer did not improve speech quality. ▪Introduction of peak-to-peak jitter of up to 6 % and spectral compensation helped in improving the quality.

Thank You

P. C. Pandey, S. K. Basha, “Enhancement of electrolaryngeal speech by spectral subtraction, spectral compensation, and introduction of jitter and shimmer”, Proc. 20th International Congress on Acoustics ( ICA 2010), 23-27 August 2010, Sydney, Australia. Abstract -- An electrolarynx, a verbal communication aid used by laryngectomy patients, is a vibrator held against the neck tissue to provide excitation to the vocal tract, as a substitute to that provided by the glottal vibrations. Although the user can set the vibration level and pitch, a dynamic control of level, voicing, and pitch during speech production is not feasible. In addition to this basic limitation, the electrolaryngeal speech suffers from (i) presence of background noise caused by leakage of acoustic energy from the vibrator and vibrator-tissue interface, (ii) low-frequency spectral deficiency, and (iii) unnatural quality due to constant pitch and level. Background noise decreases the intelligibility, while the other two factors affect the speech quality. Present study involved investigations for improving the intelligibility and quality of electrolaryngeal speech. Pitch-synchronous application of generalized spectral subtraction was used for reducing the background noise. In order to track the variation in the spectrum of the leakage noise due to changes in vibrator orientation and pressure during speech production, a dynamic estimation of noise was carried out from a set of past frames. The estimated noise spectrum was subtracted from that of the noisy speech and the resulting magnitude spectrum was combined with the original phase spectrum. The speech signal was resynthesized using overlap-add method, with two-pitch period analysis frames and one period overlap. Estimation of phase spectrum by minimum-phase assumption and the assumption of phase continuity did not improve the speech quality. An introduction of jitter and shimmer in the speech signal, using LPC based analysis-synthesis, was investigated for improving its naturalness. The excitation for synthesis was an impulse train with the frequency equal to that of the vibrator, with random frequency and amplitude modulations for providing the jitter and the shimmer, respectively. An FIR filtering of the excitation was used to match the long-term average spectral envelope of the processed electrolaryngeal speech to that of the normal speech. A peak-to-peak jitter of up to 6 % increased the naturalness, while introduction of shimmer decreased the quality.

REFERENCES 1 M. Weiss, G. Y. Komshian, and J. Heinz, “Acoustic and perceptual characteristics of speech produced with an electronic artificial larynx,” J. Acoust. Soc. Am., 65, 1298-1308 (1979). 2 H. L. Barney, F. E. Haworth, and H. K. Dunn, “An experimental transistorized artificial larynx,” Bell Systems Tech. J., 38, 1337-1356 (1959). 3 Q. Yingyong and B. Weinberg, “Low frequency energy deficit in electrolaryngeal speech,” J. Speech Hearing Res., 34, 1250-1256 (1991). 4 C. Y. Espy-Wilson, V. R. Chari, and C. B. Haung, “Enhancement of alaryngeal speech by adaptive filtering,” Proc. ICSLP, 764-771 (1996). 5 P. C. Pandey, S. M. Bhandarkar, G. K. Baccher, and P. K. Lehena, “Enhancement of alaryngeal speech using spectral subtraction,” Proc. 14th Int. Conf. Digital Signal Prcessing (DSP 2002), Santorini, Greece, 591-594 (2002). 6 P. C. Pandey, S. S. Pratapwar, and P. K. Lehana, “Enhancement of electrolaryngeal speech by reducing leakage noise using spectral subtraction with quantile based dynamic estimation of noise,” Proc. 18th Int. Congress Acoustics (ICA 2004), Kyoto, Japan, 3029-3032 (2004). 7 H. Liu, Q. Zhao, M. Wan, and S. Wang, “Application of spectral subtraction method on enhancement of electrolaryngeal speech,” J. Acoust. Soc. Am., 120, 398-406 (2006). 8 H. Liu, Q. Zhao, M. Wan and S. Wang, “Enhancement of electrolarynx speech based on auditory masking,” IEEE Trans. Biomed. Eng.,53, 865-874 (2006). • P. Mitra and P.C. Pandey, “Enhancement of electrolaryngeal speech by spectral subtraction with minimum statistics-based noise estimation,” J. Acoust. Soc. Amer., 120, 3039 (2006). 10 R. Kabir, A. Greenblatt, K. Panetta, and S. Agaian, “Enhancement of alaryngeal speech utilizing spectral subtract ion and minimum statistics,” Proc. 7th International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July (2008). 11 S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process, 27, 113-120 (1979). • M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc.IEEE ICASSP’79, 208-211 (1979). 13 V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and wiener filtering,” Proc. IEEE ICASSP’00, 3, 1875-1878 (2000).

14 R. Martin, “Spectral subtraction based on minimum statisic” Proc. 7th European Signal Processing Conf. (EUSIPCO–94), Edinburgh, Scoltland, 1182-1185 (1994). 15 T. F. Quatieri and A. V. Oppenheim,“Iterative techniques for minimum phase signal reconstruction from phase or magnitude,” IEEE Trans. Acoust., Speech, Signal Process., 29, 1187-1193 (1981). 16 B. Yegnanarayana and A. Dhayalan, “Noniterative techniques for minimum phase signal reconstruction from phase or magnitude,” Proc. IEEE ICASSP, 639-642, (1983). 17 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. (Prentice-Hall, Englewood Cliffs, New Jersey, 1975). 18 L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, (Prentice Hall, Englewood Cliffs, New Jersey, 1978).

ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia

Presentation Transcript

International City Marketing Congress (CMDC) 04.28/29.2010 Moscow

Encouraging Self Disclosure in National Service August 26, 2010 We will begin shortly! Please feel free to say hello by

August 2010 Interim Report Subgroup 2: Issues Related to Co-Product Credit s Presented to the LCFS Expert Workgroup Aug

Structure of Congress

Industrial Security Field Operations (ISFO) Office of the Designated Approving Authority (ODAA) August 2010

Welcome to the 20th Annual Awards Ceremony April 27, 2010

Acoustics and Noise Control

NWS / SPoRT Coordination Call

Special Education Leadership Conference 2010

Special Education Leadership Conference 2010

The outline of the talk :

Board of Representatives Meeting

2010 Teacher’s Alumni Workshop Series

RP Chair Centralized Training Region II CRC August 28, 2010

Article I: Legislative Branch

Andrea Love, AIA, LEED A.P. Libby Hsu, MEng

SIAS Fund Performance Review Q2 2010 August 13, 2010

ENERGY EFFICIENCY AWARENESS SEMINAR 2010 Building Code of Australia Regulatory Measures