1 / 45

Multipitch Tracking for Noisy Speech

Multipitch Tracking for Noisy Speech. DeLiang Wang The Ohio State University, U.S.A. Joint work with Mingyang Wu ( The Ohio State University ) and Guy Brown ( University of Sheffield, U.K. ). What is Pitch?.

isha
Download Presentation

Multipitch Tracking for Noisy Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multipitch Tracking for Noisy Speech DeLiang Wang The Ohio State University, U.S.A. Joint work with Mingyang Wu (The Ohio State University) and Guy Brown (University of Sheffield, U.K.)

  2. What is Pitch? • “The attribute of auditory sensation in terms of which sounds may be ordered on a musical scale.” (American Standards Association) • Periodic sound: musical tone, vowel, voiced speech. • Aperiodic sound with pitch sensation: e.g. comb-filtered noise

  3. Pitch of a Periodic Sound FundamentalFrequency (period) Pitch Frequency (period) d

  4. Applications of Pitch Tracking • Computational Auditory Scene Analysis (CASA) • Automatic music transcription • Speech coding, analysis, speaker verification and language identification.

  5. Categories of Pitch Determination Algorithms (PDAs) • Time-domain algorithms • Frequency-domain algorithms • Time-frequency domain algorithms

  6. Time-domain PDAs

  7. Frequency-domain PDAs 4f0 f0 2f0 Frequency

  8. Time-frequency Domain PDAs Periodicity analysis Acoustic input Periodicityanalysis … Filterbank Periodicity analysis Pitch estimates Integration across channels

  9. Pitch Determination Algorithms • Numerous PDAs have been proposed. For example, see Hess (1983), Hermes (1992), and de Cheveigne & Kawahara (2002). • Many PDAs are designed to detect single pitch in noisy speech. • Some PDAs are able to track more than one pitch contour. However, their performance is limited on tracking speech mixed with broadband interference.

  10. PDAs for Multipitch in Noisy Environments speech Output Pitch Tracks noise PDA speech

  11. Diagram of the Proposed Model Normalized Correlogram Channel/Peak Selection Speech/ Interference Cochlear Filtering Pitch Tracking Using HMM Channel Integration Continuous Pitch Tracks

  12. Gammatone Filterbank to Model Cochlea Filtering

  13. Multi-channel Front-end Envelope Extraction High Frequency Channels Speech/ Interference Low Frequency Channels Gammatone filterbank

  14. Periodicity Extraction Normalized Correlogram Frequency channels Delay Response to clean speech

  15. Second Stage of the Model Normalized Correlogram Channel/Peak Selection Speech/ Interference Cochlear Filtering Pitch Tracking Using HMM Channel Integration Continuous Pitch Tracks

  16. Channel and Peak Selection for Reducing Noise Interference • Some channels are masked by interference and provide corrupting information on periodicity. These corrupted channels are excluded from pitch determination. • Different strategies are used for selecting valid channels in low- and high-frequency ranges.

  17. CleanChannel Corrupted Channel Lag (delay steps) Selection of a Low-frequency Channel In a clean channel, peaks at non-zero delays are close to one. But these peaks are relatively low in a corrupted channel.

  18. Clean Channel CorruptedChannel Lag (delay steps) Selection of a High-frequency Channel - In a clean channel, normalized correlogram within the original time window and that within a longer time window have similar patterns, but in a corrupted channel they have dissimilar patterns.- Further peak selection is performed in a high-frequency channel.

  19. Summary Correlogram of Selected Channels Allchannels Only selected channels Lag (delay steps)

  20. Without Peak Selection With Peak Selection Lag (delay steps) Summary Correlogram of Selected Channels with Selected Peaks

  21. Third Stage of the Model Normalized Correlogram Channel/Peak Selection Speech/ Interference Cochlear Filtering Pitch Tracking Using HMM Channel Integration Continuous Pitch Tracks

  22. Integration of Periodicity Information Across Channels • How does a frequency channel contribute to a pitch-period hypothesis? • How to integrate the contributions from different channels?

  23. Peaks and Pitch Delay Ideal Pitch Delay Peak Delay Relative Time Lag

  24. Relative Time Lag Statistics Histogram of relative time lags from natural speech

  25. Estimated probability distribution of relative time lags (sum of Laplacian and uniform distributions) Relative Time Lag Statistics

  26. Observation Probability in One Channel Normalized Correlogram p(channel|pitch delay)

  27. Channel Combination • Step 1: taking the product of observation probabilities of all channels in a time frame. • Step 2: flattening the product probability. The responses of different channels are usually correlated and this step is used to correct the probability overshoot phenomenon.

  28. Integrated Observation Probability Distribution (1 Pitch) Log(Probability) Pitch delay

  29. Integrated Observation Probability Distribution (2 Pitches) Log(Probability) Pitch Delay 2 Pitch Delay 1

  30. Fourth Stage of the Model Normalized Correlogram Channel/Peak Selection Speech/ Interference Cochlear Filtering Pitch Tracking Using HMM Channel Integration Continuous Pitch Tracks

  31. Prediction and Posterior Probabilities Prior probabilities for time frame t Assuming pitch period d for time frame t-1 d Observation probabilities for time frame t Posterior probabilities for time frame t d d

  32. Pitch Change Statistics in Consecutive Time Frames Consistent with the pitch declination phenomenon in natural speech.

  33. Hidden Markov Model as Tracking Mechanism Observed Signal Observation Probability Pitch State Space Pitch Dynamics One Time Frame Viterbi algorithm is used to find the optimal sequence of pitch states.

  34. Results • Test the system on the mixtures of 10 speech utterances and 10 interferences (Cooke, 1993). • The interferences are 1 kHz tone, white noise, noise bursts, “cocktail party” noise, rock music, siren, trill telephone, two female and one male utterances of speech.

  35. Tolonen & Karjalainen (2000) Our algorithm Pitch Period (ms) Time (s) Time (s) A Male Utterance and White Noise (SNR = –2 dB)

  36. A Male Utterance and White Noise (cont.) Revised Gu & Bokhoven (1991) Gu & Bokhoven (1991) Pitch Period (ms) Time(s) Time (s)

  37. A Male Utterance and White Noise (cont.) A single pitch tracker by Rouat, Liu & Morissette (1997) Pitch Period (ms) Time (s)

  38. Our algorithm Tolonen & Karjalainen (2000) Pitch Period (ms) Time (s) Time (s) Simultaneous Utterances of a Male and a Female Speaker

  39. Simultaneous Utterances of a Male and a Female Speaker (cont.) Revised Gu & Bokhoven (1991) Gu & Bokhoven (1991) Pitch Period (ms) Time (s) Time (s)

  40. Categorization of Interference Signals

  41. Error Rates (in Percentage) for Category 1 Interference

  42. Error Rates (in Percentage) for Category 2 Interference

  43. Error Rates (in Percentage) for Category 3 Interference

  44. A CASA Application Demo Original mixture Segregated male utterance using a correlogram-based pitch tracker (Wang & Brown’99) Segregated utterance using our algorithm

  45. Conclusion • Improved channel/peak selection method for reducing noise interference. • Statistical integration method effectively utilizing the periodicity information across all channels. • HMM for modeling continuous pitch tracks. • Our algorithm performs reliably for tracking single and double pitch tracks in noisy acoustic environments. • The algorithm outperforms others by a substantial margin.

More Related