1 / 15

by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING

Digital Signal Processing ( Term Project ). by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING. Speaker Recognition System. Introduction What is Speaker Recognition?. A process that automatically recognizes, who is speaking on the basis of individual

grover
Download Presentation

by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Signal Processing (Term Project) • by • Habib ur Rehman • Abdul Basit • CENTER FOR ADVANCED STUDIES IN ENGINERING Speaker Recognition System

  2. IntroductionWhat is Speaker Recognition? A process that automatically recognizes, who is speaking on the basis of individual information included in the speech waves Words Speaker Recognition “Who are you?” Speech Signal

  3. Speaker Recognition SystemGoals • The goal of this project is to build a • simple, yet complete and representative • ‘speaker recognition system ‘. • The system should be able to identify • speakers based on the different voice • characteristics of each of the known • speakers. • This identification should be accomplished • regardless of the sentence spoken (Text • independent).

  4. Basic Structure of Speaker Recognition SystemSpeaker Identification /Speaker Verification

  5. Principle of speaker Recognition systemIntroduction • All speaker Recognition systems have to serve two distinguished phases. • Enrollment or Training phase • Testing phase In training phase each registered speaker has to provide samples of their speech so that the system can build a reference model for thatspeaker In testing the input speech is matched with stored reference model(s) and recognition decision is made

  6. Basic structure of speaker RecognitionsystemFeature Extraction / Feature Matching

  7. MFCC ProcessorBlock diagram • Windowing the frames minimize the signal discontinuities at the beg & end of each frame • Windowing minimize spectral distortion to taper • the signal to zero at beg. & end of each frame. • y[n]=x[n]w[n] • Typically Hamming window is used which has the • FFT • Cosine Transform (Mel Cepstrum) • Continuous signal is blocked into frames of N samples. • 1st fram consists of N samples • 2nd frame begins M samples after the 1st & overlap it • N-M samples and so on • Typically N=256(radix 2 FFT), M=100 Frame Blocking Windowing Fourier Transform spectrum Mel cepstrum Mel Mel freq. Wrapping Cepstrum spectrum

  8. Speech ProductionA Convolution Process • Speech can be modeled as • convolution between • Glottal exitation source g[n] • & • A vocal tract impulse response • v[n] • y[n] =g[n]*v[n]

  9. CepstrumA transformation • It is believed that vocal tract characterstics • are important to speech & speaker • recognition. • We would like to separate out this filtered • response. • Cepstrum does this & convertsmultiplication • (convolution in time) • Y( )=g( )v( ) • to sum • Y~( )=log[g( )]+log[v( )]

  10. Mel CepstrumMimicing the behaviour of human ear

  11. Mel filter banklinear spacing below 1kHz, log. Scale above 1kHz • Triangular shaped filters • emphasize center frequency and • span to the next center frequency. • Thus for each tone with actual freq. • in Hz. • a subjective pitch is measured on • Mel scale • mel(f)= 2595*log10(1+f / 700) • (Fant’s expresion)

  12. Part 2 Speaker Verification

  13. Speaker VerificationFeature Matching • Clasification of objects of interest into patterns or acoustic vectors extracted from input speech • Since the classification is applied on extracted features, the process can also be reffered to as feature matching • Various feature maching techniques DTW,HMM & VQ etc • Vector Quantization is a process of mapping vectors from a large vector space to a small number of regions in space . • Each region is called a cluster and is represented by its center called a ‘codeword’. • The collection of all the ‘codewords’ is called a codebook.

  14. Vector QuantizationThe codebook

  15. Vector Quantisation (The LBG algorithm)

More Related