1 / 22

Speaker Recognition

Speaker Recognition. S. Arun Nair, Vaibhav Singh, Dheeraj Mehra , Rohan Paul. Speaker Identification System. Enrollment Phase. Identification Phase. Methodology. Database Creation 30 speaker database Random text : 1 min samples Telephone Quality : 8bit samples at 8K rate

badrani
Download Presentation

Speaker Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker Recognition S. Arun Nair, Vaibhav Singh, Dheeraj Mehra , Rohan Paul

  2. Speaker Identification System Enrollment Phase Identification Phase

  3. Methodology • Database Creation 30 speaker database Random text : 1 min samples Telephone Quality : 8bit samples at 8K rate • Pre-Processing Noise Removal : Wavelet Transform Silence Removal (Envelop Detection) Framing • Feature Extraction Mel-Cepstral coefficients Singular Value Decomposition (Dimensionality Reduction) • Learning Problem Gaussian Mixture Modeling, EM Bayesian Classification

  4. Feature Extraction Hamming Window Function

  5. Hamming Window Hamming Window Function

  6. Application of Hamming Window

  7. Mel- Frequency Cepstrum Coefficients Cepstrum (frame) = IDFT(log (|DFT(frame)|))

  8. Feature Extraction: Mel Filters

  9. Singular Value Decomposition Plot of Sigma Values Number of dimensions were selected as 13

  10. Gaussian Mixture Modeling • Linear Combination of Gaussians • Speaker dependent vocal tract configurations • Vocal Classes – vowels - nasals - fricatives • Modeling noise • Smooth approximations to arbitrarily shaped distributions

  11. Gaussian Mixture Modeling

  12. Maximum Likelihood Parameter Estimation Goal: Find model parameters which best match the distribution of the training feature vector Training Set Given present parameters, the likelihood of obtaining this set Iterate to improve the estimate

  13. Recognition Phase Assume that each class is equally likely log likelihood

  14. Demonstration

  15. Multiple Speaker Recognition • Run K-means on the test data • Choose B-best samples from each domain • Calculate posterior probabilities and discover the class Problem: We did not get separate clusters Have to weigh the distance with the variance Basic Assumption Speakers are separable in a higher dimensional space

  16. Multiple Speaker Recognition Approach II: Gaussian Mixture Modeling Number of Gaussians equal to the number of speakers in the input

  17. Multiple Speaker Recognition Approach II: Gaussian Mixture Modeling Recognition of speakers • KL-Divergence • Distance between Means • Distance between Incorporating variances

  18. Multiple Speaker Recognition 6 Speaker Database

  19. Cluster Sizes Input: Ankit, Advait Two sizeable clusters Input: Nilay, Priyanka One Cluster Dominates

  20. Open Issues • Standardization of mics - signal threshold • Clustering – Seperability -Overgrowing - Three gaussians (clusters) for two speakers • Appropriate distance metric for recognition phase • Intuition about nearness of samples is difficult

  21. Work Distribution Phase I • Pre-processing Vaibhav, Rohan Feature extraction SVD • Database construction Dheeraj, Arun, Rohan GMM, EM Phase II • K- means Dheeraj, Arun Clustering using GMM • KL-divergence Vaibhav, Arun, Rohan Other techniques Experimentation • Front End Dheeraj, Vaibhav

  22. Thank You

More Related