1 / 79

Model Formation and Classification Techniques For Conversation-based Speaker Discrimination

Model Formation and Classification Techniques For Conversation-based Speaker Discrimination. Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D. Uchechukwu O. Ofoegbu. Acknowledgement . Advisor: Robert Yantorno, Ph.D

eros
Download Presentation

Model Formation and Classification Techniques For Conversation-based Speaker Discrimination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Formation and Classification Techniques For Conversation-based Speaker Discrimination Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D. Uchechukwu O. Ofoegbu

  2. Acknowledgement Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D. My committee members, for your time and commitment to my research The Air Force Research Labs, for financially supporting most of this research work My family, for being there Dr Y, the best advisor one could hope for Members and Friends of the Speech Lab, for your valuable contributions ECE faculty and staff, for your great support The audience, for being a part of this

  3. Presentation Outline • Introduction • Challenges of Conversational Data • General Applications of Research • Novelty of Research • Introduction • Evaluation Databases • Modeling Speakers • Traditional Speaker Modeling • Proposed Method • Features Used • Distance Used • Introduction • Evaluation Databases • Modeling Speakers • Application Systems • Unsupervised Speaker Indexing • Speaker Count • Generalized Speaker Indexing • Introduction • Evaluation Databases • HTIMIT • SWITCHBOARD • New Conversations Database • Introduction • Evaluation Databases • Modeling Speakers • Application Systems • Fusion of Distance Measures • “Optimized T Distance • Decision-Based Combination • Weighted Decision-Based Combination • Introduction • Evaluation Databases • Modeling Speakers • Application Systems • Fusion of Distance Measures • Summary • Introduction • Evaluation Databases • Modeling Speakers • Application Systems • Fusion of Distance Measures • Summary • Further Research Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D.

  4. Introduction Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  5. Challenges of Conversational Data • No a priori information available from participating speakers • Training is impossible • No a priori knowledge of change points • Speakers alternate very rapidly • Limited amounts of data for single speaker representations • Distortion • Channel noise, co-channel data Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  6. Proposed Solutions • Selective creation of data models • Distance-Based Model Comparison • Development of application-specific system Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  7. Novelty of this Research • Selective creation of data models • Distance-Based Model Comparison • Development of application-specific system Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  8. Applications • Monitoring criminal conversations • Forensics • Automated Customer Services • Storage/Search/Retrieval of Audio Data • Military Activities • Conference calls Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  9. Databases • Standard Speaker Discrimination Databases • HTMIT • Switchboard • Temple Conversations Database (TCD) Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  10. Modeling Speakers Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  11. Traditional Speaker Modeling • Examples • Gaussian Mixture Models • Hidden Markov Models • Neural Networks • Prosody-Based Models • Disadvantages • Require large amounts • Sometimes require training procedure • Relatively complex Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  12. Conversational Data Modeling • Current Method • Equal segmentation of data • Indiscriminate use of data • Problems • Change points unknown • Not all speech is useful • Poor performance Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  13. S V U V U V … U V U V S V . . . V V V V V V MEAN AND COVARIANCE MATRIX COMPUTATION MEAN AND COVARIANCE MATRIX COMPUTATION Proposed Speaker Modeling Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research SEGMENT 1 SEGMENT M FEATURE COMPUTATION FEATURE COMPUTATION . . . MODEL 1 MODEL M

  14. Proposed Speaker Modeling • Why voiced only? • Same speech class compared • Contains the most information • What’s the appropriate number of phonemes? • Large enough to sufficiently represent speakers • Small enough to avoid speaker overlap Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  15. Features Considered • Linear Predictive Cepstral Coefficients • Model the vocal tract • Mel-Scale Frequency Cepstral Coefficients • Model the human auditory system Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  16. Distance Measurements Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Different speaker distances Same speaker distances

  17. Distances Used • Mahalanobis Distance • Hotelling’s T-Square Statistics • Kullback-Leibler Distance • Bhattacharyya Distance • Levene’s Test Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  18. Analysis of Cepstral Features • Mahalanobis Distance Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  19. Best Number of Phonemes? Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Number of Phonemes Features Used - LPCC

  20. Application Systems Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  21. Unsupervised Speaker Indexing • The Restrained-Relative Minimum Distance (RRMD) Approach Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research REFERENCE MODELS 0 D1,2 D1,3 … D2,1 0 D2,3 … D3,1 D3,2 0 … … 0 D1,2 D1,3 … D2,1 0 D2,3 … D3,1D3,2 0 … …

  22. Unsupervised Speaker Indexing • The Restrained-Relative Minimum Distance (RRMD) Approach Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Observe distance Reference 1 Reference 2 Unusable Data Failed Min. Distance Failed Relative Distance Condition Passed Restraining Condition Same Speaker? Same Speaker Passed

  23. RRMD Approach • Restraining Condition • Distance Likelihood Ratio DLR > 1  Same Speaker DLR < 1  Check Relative Distance Condition Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  24. Reference 1 Reference 2 RRMD Approach • Relative Distance Condition • Relative Distance: Drel = dmax – dmin • Drel > threshold  Same Speaker Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research dmin dmax

  25. Experiments and Results • Experiments • HTIMIT used for obtaining likelihood ratio parameters • 1000 same speaker and 1000 different speaker utterances computed • 100 conversations from Switchboard database used for evaluation Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  26. Indexing Results - Mahalanobis LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  27. Indexing Results – T-Square LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  28. Indexing Results - Bhattacharyya LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  29. Indexing Results - Summary • Mahalanobis distance yielded best results • LPCCs outperformed MFCCs Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  30. Reference Model Selected Randomly Reference Model Selected Randomly Reference Model Selected Randomly Speaker Count System • The Residual Ratio Algorithm (RRA) • Process is repeated K-1 times for counting up to K speakers Too little data Removed, select Another model Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research DLR-based Model Comparison DLR-based Model Comparison . . .

  31. Speaker Count • Added Residual Ratio: • Is the sum of the residual ratios in all elimination stages • Should be higher for greater number of speakers Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  32. Experiments and Results • Experiments • 4000 conversations generated from HTIMIT • All 40 conversations from new database used Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  33. Speaker Count Results - HTIMIT LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  34. Speaker Count Results - HTIMIT LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  35. Speaker Count Results – TCD LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  36. Speaker Count Results – TCD LPCC MFCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  37. Cross Evaluation Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research HTIMIT – LPCCs with the WDBC TCD – MFCCs with the T-Square

  38. Speaker Counting-Indexing • The Residual Ratio speaker count algorithm is applied • Test models are associated with their matching reference models • Unmatched models are assigned to the references from which it has the minimum distance. Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  39. Speaker Counting /Indexing Results Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Solid - HTMIT; Patterned – TCD

  40. Fusion of Distance Measures Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  41. Correlation Analysis Draftsman’s Display - LPCC Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  42. “Best Distance” • Optimal Criteria for Fusion of Distances • Maximize inter-speaker variation • Minimize intra-speaker variation • Maximize T-test value between inter-class distance distributions Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  43. Decision Level Fusion Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research D1 => match D2 => no match Match = ¾ No Match = ¼ Final Decision = Match D3 => match D4 => match

  44. Weighted Decision Level Fusion Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research Ti = T-value corresponding to each distance

  45. Summary Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  46. Research Goal • To differentiate between speakers in a conversation • To determine the number of speakers present • To determine who is speaking when • To overcome the following challenges • No a priori information • Limited data size • No knowledge of change points • Co-channel speech Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  47. Summary of Accomplishments • Novel model formation technique • Three novel approaches for conversations-based speaker differentiation • Distance combination techniques to enhance performance Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  48. Observations • Mahalanobis Distance, LPCCs optimal for standard databases • T-Square Distance, MFCCs optimal for new database • Best fusion technique: Weighted voting combination technique most efficient Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  49. Conclusion • Developed system yields about 6% EER whereas state of the art speaker indexing systems yield about 10% error rate. • Methods for discrimination between speakers (speaker count or indexing) in CONVERSATIONS with more than two speakers have been introduced. Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

  50. Further Research Introduction Evaluation Databases Modeling Speakers Application Systems Fusion of Distance Measures Summary Further Research

More Related