1 / 33

Robust HMM classification schemes for speaker recognition using integral decode

Robust HMM classification schemes for speaker recognition using integral decode. Marie Roch Florida International University. Who am I?. Speaker Recognition. Types of speaker recognition. . Speaker Recognition. Why is it hard? Minimal training data Background noise Transducer mismatch

twila
Download Presentation

Robust HMM classification schemes for speaker recognition using integral decode

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University

  2. Who am I?

  3. Speaker Recognition • Types of speaker recognition 

  4. Speaker Recognition • Why is it hard? • Minimal training data • Background noise • Transducer mismatch • Channel distortions • People’s voices change over time and under stress • Performance

  5. Feature Extraction • Extract speech • Spectral analysis • Cepstrum: • Cepstral means removal

  6. Hidden Markov Models • Statistical pattern recognition • State dependent modeling • Distribution/state • Radial basis functions common • State sequence unobservable

  7. HMM • Efficient decoders: • Training • EM algorithm • Convergence to local maxima guaranteed

  8. Recognition • Model for each speaker • Maximum a priori (MAP) decision rule Arg Max Scores Features Models

  9. The MAP decision rule • Optimal decision rule provided we have accurate distribution parameters & observations. • Problem: • Corruption of feature vectors. • Distribution known to be inaccurate.

  10. A case of mistaken identity

  11. Integral decode • Goal: Include uncorrupted observation ôt. • Problem: ôt unobservable. • Determine a local neighborhood t about ot and use a priori information to weight the likelihood:

  12. Integral decode issues • Problems approximating the integral • High frame rate * number of models • Non-trivial dimensionality • Selection of the neighborhood

  13. Approximating the integral • Monte Carlo impractical • Use simplified cubature technique:

  14. Neighborhood choice • Choosing an appropriate neighborhood: • Upper bound difference neighborhoods [Merhav and Lee 93] • Error source modeling

  15. Upper bound difference neighborhoods • Arbitrary signal pairs with a few general conditions. • PSD • Cepstra

  16. Taking the upper bound • Asymptotic difference between cepstral parameters:

  17. Error source modeling • Multiple error sources • Simplifying assumption of one normal distribution with zero mean • Use time series analysis to estimate the noise • Trend

  18. Error Source Modeling • Estimate variance from detrended signal

  19. Error source modeling • Problem: • is infinite • Solution: • Most of the points are outliers • Set percentage of distribution beyond which points are culled.

  20. Complexity of integration • Expensive • Ways to reduce/cope • Implemented • Top K processing • Principle Components Analysis • Possible • Gaussian Selection • Sub-band Models • SIMD or MIMD parallelism

  21. Top K Processing 1 second 3 seconds 5 seconds

  22. Principal Component Analysis • Choose P most important directions

  23. Principal Component Analysis • Integrate using new basis set for step function

  24. Speech Corpus • King-92 • Used San Diego subset • 26 male speakers • Long distance telephone speech • Quiet room environment • 5 sessions recorded one week apart • 1-3 train • Sessions 4-5 partitioned into test segments

  25. Baseline performance

  26. 1 second 3 seconds 5 seconds Integral decode performance

  27. Integral decode with other conditions • Performance on • high quality speech • transducer mismatch

  28. Future work • Extensions to the integral decode • Automatic parameter selection • Gaussian selection • distributed computation • Efficient multiple class preclassifiers

  29. Optimal/utterance hyperparameters – 5 seconds KingNB26 KingWB51 SpidreF18XDR SpidreM27XDR

  30. 95% Confidence Intervals • Caveat: • Per speaker means • Large granularity

  31. Pattern Recognition • Long term statistics [Bricker et al 71, Markel et al 77] • Vector Quantization [Soong et al 87] • HMM [Rosenberg et al 90, Tishby 91, Matsui & Furui 92, Reynolds et al 95] • Connectionist frameworks • Feed forward [Oglesby & Mason 90] • Learning vector quantization [He et al 99]

  32. Pattern Recognition Contd. • Hybrid/Modified HMMs • Min Classification Error discriminant [Liu et al 95] • Tree structured neural classifiers [Liou & Mammone 95] • Trajectory modeling [Russell et al 85, Liu et al 95, Ostendorf et al 96, He et al 99] • Sub-band recognition [Besacier & Bonastre 97]

More Related