1 / 24

Overview

Combining Gaussian Mixture Models with Support Vector Machines for Text-independent Speaker Verification. Jamal Kharroubi Dijana Petrovska-Delacrétaz Gérard C hollet (kharroub, petrovsk, chollet ) @tsi.enst.fr Eurospeech, 3-7 September 2001. Overview. Motivations SVM principles

kaylee
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Gaussian Mixture Models with Support Vector Machines for Text-independent Speaker Verification Jamal Kharroubi Dijana Petrovska-DelacrétazGérard Chollet(kharroub, petrovsk,chollet)@tsi.enst.fr Eurospeech, 3-7 September 2001

  2. Overview • Motivations • SVM principles • Previous experiment with SVMs for speaker identification • Our method : combine GMMs and SVMs for speaker verification • Database • Experimental protocol • Results • Conclusions and perspectives

  3. Motivations • Gaussian Mixture Models (GMM) • State of the art for speaker verification • Support Vector Machines (SVM) • New and promising technique instatistical learning theory • Discriminative method • Good performance in image processing and multi-modal authentication • Seem to be "à la mode" at Eurospeech 2001 • Combine GMM and SVM for Speaker Verification

  4. SVM Principles • Pattern classification problem :given a set of labelled training data, learn to classify unlabelled test data • Finding decision boundaries that separate the classes, minimising the number of classification errors • SVM are : • Binary classifiers • Capable of determining automatically the complexity of the decision boundary

  5. Separating hyperplane H , with the optimal hyperplane Ho Feature space Input space H y(X) X Class(X) Ho SVM principles, cntd.

  6. SVM Theory Feature Space Input Space Classification Function

  7. Previous experiment with SVM for Speaker Recognition Speaker Identification with SVM : Schmidt and Gish, 1996 • Goal : identify one among a given closed set of speakers • The input vectors of the SVM’s are the spectral parameters • Slightly better performance with SVM’s, with the pairwize classifier • Why these disapointing results? • Too short train/test durations • GMM’s perhaps better suited to model the data • GMM’s perhaps more robust to channel variation

  8. SVM and Speaker Verification • Difficulty : mismatch of the quantity of labelled data,more data available for impostor access than true target • Our preliminary test, with speech feature frames as input to SVM => no satisfactory results • Present approach : • Combine GMM and SVM

  9. Our method of combining GMM and SVM for Speaker Verification • Usual GMM Modeling • Need additional labelled development set for the SVM • Construction of the SVM input vector, with the GMM models • Model globally with the SVM, the client-client against the client-impostor access (with the development data)

  10. WORLDGMMMODEL GMMMODELING WORLD DATA Front-end TARGETGMMMODEL TARGET SPEAKER GMM modellingor adaptation Front-end GMM speaker modeling

  11. HYPOTH.TARGETGMM MOD. Front-end WORLDGMMMODEL Baseline GMM method l Test Speech = LLR SCORE

  12. The score St is added to the ith component of Test segment If If The score St is added to the ith component of Frame t Construction of the SVM input vectors SVM(max) method HYPOTH.TARGETGMM MOD. WORLDGMMMODEL

  13. Construction of the SVM input vectors • The input SVM vector is the concatenation of • Summation and normalization of the SVM input vector by the number of frames of the test segment T

  14. Train Client class Impostor class SVMCLASSIFIER ... ... Test SVMINPUT VECTORCONSTRUCTION Test speech Decision score SVM : Train /Test

  15. Database Nist’99 data splitted in : • Development data = 100 speakers • 2min GMM model • Corresponding test data to train the SVM classifier(519 true and 5190 impostor accesses) • World data = 200 speakers • 4 sex/handset dependent world models • Evaluation data = 100 speakers =449 true and 4490 impostor accesses

  16. Experimental Protocol : Feature Extraction • LFCC parametrization (32.5 ms windows every 10 ms) • Cepstral mean substraction for channel compensation • Feature vector dimension is 33 (16 cep, 16 dcep,  log E)(Delta cepstral features on 5-frames windows) • Frame removal algorithm applied onfeature vectors to discard non significant frames (bimodal energy distributions)

  17. Experimental Protocol cont. • GMM Speaker and world models • with 128 mixtures • diagonal covariance matrix • Standard EM algorithm with a max. of 20 iterations • ML and MAP method to construct the client mode • With ELISA Consortium Toolkit • SVM model was trained using a part of the development corpus • Linear kernel is used • With svmFu Package

  18. GMM(ml) / hybrid GMM(ml)+ SVM(max) no normalization

  19. GMM(map) / GMM(ml) No normalization

  20. GMM(map) / GMM(map)+SVM(max) No normalization

  21. Why no improvement when GMM(map)+SVM(max) • ML training -Speaker model trained independently of the World model • MAP training - derive Speaker's model by updating the World parameters via adaptation • Stronger coupling of Speaker <=> World models

  22. The score St is added to the ith components of Test segment Frame t New SVM input vector Representation: SVM(maxRatio) HYPOTH.TARGETGMM MOD. WORLDGMMMODEL

  23. GMM(map)+SVM(max)GMM(map-maxRatio)GMM(map) No normalization GMM(map)+SVM(maxRatio)

  24. Conclusions and Perspectives • Improvements with our hybrid GMM+SVM are dependant on the GMM adaptation method • Better(or similar) results with GMM+SVM method if appropriate SVM input vector representation choosen • Different kernel types and features will be experimented • Normalization techniques will be applied

More Related