1 / 21

Detection of Burst Onset Landmarks in Speech Using Rate of Change of Spectral Moments

Detection of Burst Onset Landmarks in Speech Using Rate of Change of Spectral Moments A. R. Jayan P. S. Rajath Bhat P. C. Pandey { arjayan , rajathbhat , pcpandey }@ ee.iitb.ac.in EE Dept, IIT Bombay 30 th January, 2011. PRESENTATION OUTLINE. 1. Introduction  Speech landmarks

Download Presentation

Detection of Burst Onset Landmarks in Speech Using Rate of Change of Spectral Moments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detection of Burst Onset Landmarks in Speech Using Rate of Change of Spectral Moments A. R. Jayan P. S. Rajath Bhat P. C. Pandey {arjayan, rajathbhat, pcpandey}@ee.iitb.ac.in EE Dept, IIT Bombay 30th January, 2011

  2. PRESENTATION OUTLINE 1. Introduction  Speech landmarks  Landmark detection  Clear speech  Automated speech intelligibility enhancement 2. Methodology  Band energy parameters  Spectral moments  Rate of change function 3. Evaluation and results  VCV utterances  Sentences 4. Conclusion

  3. 1. INTRODUCTION Speech landmarks Regions, associated with spectral transitions, containing important information for speech perception Landmarks and related events [Park, 2008]

  4. Landmark detection Processing  Extraction of parameters characterizing the landmark  Computation of the rate of change (ROC) of parameters  Locating the landmark using ROC(s) Applications  Intelligibility enhancement  Speech recognition  Vocal tract shape estimation

  5. Clear speech  Speech produced with clear articulation when talking to a hearing-impaired listener, or in a noisy environment More intelligible for ▪ Hearing impaired listeners (~17% higher, Picheny et al.,1985) ▪ Listeners in noisy environments (Payton et al., 1994) ▪ Non-native listeners (Bradlow and Bent, 2002) ▪ Children with learning disabilities (Bradlow et al., 2003)  Pronounced acoustic landmarks

  6. Example: ‘The book tells a story’ (Recordings from http://www.acoustics.org/press/145th/clr-spch-tab.htm) Conv. Clear

  7. Automated speech intelligibility enhancement Automated detection of landmarks  High detection rate with low false detections  Good temporal accuracy (5-10 ms)  Computational efficiency Modification of speech characteristics Intensity / duration / spectral modifications around landmarks with minimal perceptual distortions of the acoustic cues in the speech signal

  8. Problems in stop consonant perception  Transient sound with low intensity  Severely affected by noise / hearing impairment Stop landmarks: Closure  Burst onset  Onset of voicing Example: /apa/

  9. Some of the earlier landmark detection techniques  Liu (1996): Rate-of-rise measures of parameters from a set of fixed spectral bands (Speech recognition, g, s, b landmarks, 80 TIMIT sentences, detection rate: 84 % at 20-30 ms, 50 % at 5-10 ms)  Salomon et al. (2002): Temporal parameters related to periodicity, envelope, spectral fine structure (Speech recognition, onsets and offsets of vowels, sonorants, & consonants, 120 TIMIT sentences, detection rate: 90 % at 20 ms)  Sainath and Hazan (2006): Sinusoidal model parameters (Speech segmentation,453 TIMIT sentences, word error rates: 20 % )  Niyogi & Sondhi (2002): Stop landmark detection using total energy, energy above 3 kHz & Wiener entropy(Speech recognition, stop consonants, 320 TIMIT sentences,detection rate: 90 % at 20 ms)  Jayan & Pandey (2009): Stop landmark detection using GMM parameters(Speech enhancement, 50 TIMIT sentences, detection rate: 73 % at 5 ms)

  10. Improving landmark detection Parameters ▪ Capturing spectral transitions ▪ Adaptation to speech variability Rate of change measure ▪ Range of parameter variations ▪ Correlation among parameters  Adaptive time steps ▪ Small time step for abrupt variations ▪ Large time step for slow variations Objective of the present investigation Detection of burst landmarks for automated intelligibility enhancement

  11. 2. METHODOLOGY • Band energy parameters • Log of spectral peaks in three bands • ▪ b1: 1.2-2.0 kHz ▪ b2: 2.0-3.5 kHz ▪ b3: 3.5-5.0 kHz • Mag. spectrum (10 kHz sampling) computed using 512-point DFT, 6 ms Hanning window, 1 frame per ms, and smoothed by 20-point moving average. • Smoothed mag. spectrum X(n, k) used for calculating log of spectral peak in band i n= time index, k=frequency index

  12. Example:Band energy parameters for /aga/ (a) Speech waveform (b) Band energy's Time (ms)

  13. Spectral momentsNormalized spectrum n= time index, k=frequency index, N =DFT size  Centroid :frequency of energy concentration  Variance :spread of energy around the centroid  Skewness :measure of spectral symmetry  Kurtosis :measure of spectral peakiness

  14. Example:Band energy parameters & spectral moments for /aga/ (a) Waveform (b) (c) (d) Time (ms)

  15. Measures of rate of change ●First difference based rate of change (ROC) K = time step ● Mahalanobis distance based rate of change (ROC-MD) A single measure indicative of the overall variation, taking care of parameter range and correlation effects y(n) = parameter set at time n K = time step  = covariance matrix, pre-calculated using the parameter set from segments with energy above a threshold

  16. Detection of voicing offset and onset ▪ Band energy in 0-400 Hz ▪ ROC(n) computed with time step 50 ms ▪ Voicing offset [g-] : ROC(n)  -12 dB ▪ Voicing onset [g+] : ROC(n)  +12 dB Burst onset landmark detection Most prominent peak in the ROC-MD(n) between g- and g+ Example /aga/ (a) Waveform (b) ROC-MD (c) ROC Time (ms)

  17. 3. EVALUTATION & RESULTS Effects of rate of change functions & parameters on burst detection ROC and parameters 1)ROC(BE):Sum of normalized ROCs of [Eb1, Eb2, Eb3] 2)ROC-MD(BE): ROC-MD of [Eb1, Eb2, Eb3] 3)ROC-MD(SM): ROC-MD of [Fc, F,Fk , Fs] 4)ROC-MD(BE,SM): ROC-MD of [Eb1, Eb2, Eb3, Fc , F , Fk , Fs] Material:VCV utterances, TIMIT sentences Time steps:3, 6 ms Temporal accuracies:3, 5, 10, 15, 20 ms

  18. VCV utterances ▪ 6 stop consonants (b, d, g, p, t, k) ▪ 3 vowel contexts (a, i, u) ▪ 10 speakers (5 M, 5 F) ▪ 180 tokens

  19. TIMIT Sentences ▪ 5 speakers (2 M, 3 F) ▪ 10 sentences from each speaker ▪ 238 tokens

  20. 4. CONCLUSION  Increase in time steps reduced detection accuracy.  Mahalanobis distance based ROC was more effective than first-difference based rate of change.  Spectral moments were useful as additional parameters in improving burst-onset detection.

  21. Thank you

More Related