1 / 37

Design of Automatic Face/Voice Recognition Systems for Personal Identification

United Arab Emirates University College of Engineering. Design of Automatic Face/Voice Recognition Systems for Personal Identification. Supervisor: Dr. Farhad Kissain Mariam Al Dhuhoori 970724502 Fatema Mohammed 199902260 Laila AL Shehhi 199902258 Mona Atti AL-Rashdi 199904062.

odette
Download Presentation

Design of Automatic Face/Voice Recognition Systems for Personal Identification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. United Arab Emirates UniversityCollege of Engineering Design of Automatic Face/Voice Recognition Systems for Personal Identification Supervisor: Dr. Farhad Kissain Mariam Al Dhuhoori 970724502 Fatema Mohammed 199902260 Laila AL Shehhi 199902258 Mona Atti AL-Rashdi 199904062

  2. Overview • Introduction. • Main principles of speaker recognition. • Selected Method. • Speaker recognition Models. • Feature Extraction. • Feature Extraction Implementation. • Conclusion.

  3. Introduction:

  4. Objectives of our Project • Design and implement a simple face recognition system. • Design and implement an automatic voice recognition system. • The MatLAB program is used to implement the project.

  5. Speaker Recognition methods • Text Dependent : For speaker identity is based on his/ her speaking one or more specific phase. • Text Independent: Speaker models capture characteristics of somebody’s speech which show up irrespective of what one is saying.

  6. Selected Method • Text Independent: Identify the person who speaks regardless to what is saying.

  7. Speech Feature Extraction • That extracts a small amount of data from the voice signal that can later be used to represent each speaker. • MFCC: is based on the known variation of the human ear’s critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies.

  8. There are silence at the beginning and at the end of the signals. The word consists of two syllables. ‘ro’ ‘ze’ Input Speech Signals

  9. Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum Frame Blocking Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N). • Frame1 Consist of First N samples. • Frame2 Begins M samples after first frame, and the overlaps it by N-M samples. • Frame3 Begins 2M samples after the first frame, and the overlaps it by N-2M samples.

  10. Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum Frame Blocking Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N). • Frame1 Consist of First N samples. • Frame2 Begins M samples after first frame, and the overlaps it by N-M samples. • Frame3 Begins 2M samples after the first frame, and the overlaps it by N-2M samples.

  11. After Frame Blocking • The speech signals were blocked into frames of N samples with overlap.

  12. Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum Windowing • Each individual frame will be windowed. • Hamming window is used in this project.

  13. After Windowing

  14. Before Windowing After Windowing

  15. Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Convert each frame of N samples form time domain into the frequency domain.

  16. Log Scale of S1 speech wave

  17. Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Mel frequency scale is a linear frequency spacing below 1KHz and a logarithmic spacing above 1KHz. • Filter Bank.

  18. Mel –Spaced Filter Bank

  19. Mel-Frequency Wrapping

  20. Before Mel-Frequency wrapping After Mel-Frequency wrapping

  21. Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Convert the log mel spectrum back to Time domain.

  22. Features Matching Method • The Dynamic Time Warping, (DTW) • Hidden Markov Modeling (HMM) • Vector Quantization (VQ)

  23. Clustering of training vector

  24. Data points for all sounds of first set

  25. Data points of speaker 5 and speaker 6 of first set

  26. Data points of all sounds after passing it into LBG Algorithm for first set

  27. Data points of all sounds after passing it into LBG Algorithm for first set

  28. Vector Quantization (VQ) source modeling • VQ. • cluster. • codeword. • codebook.

  29. VQ advantages • The model is trained much faster than other method like Back Propagation. •   It is able to reduce large datasets to a smaller number of codebook vectors. •   Can handle data with missing values. •   The generated model can be updated incrementally. •  Not limited in the number of dimensions in the codebook vectors like nearest national techniques. • Easy to implementation and more accurate.

  30. Sounds (word: “twenty”) Laila Mona Mariam Fatema S1 S1 S1 S1 S1 S2 S2 S4 S2 S3 S3 S3 S3 S3 S2 S4 S4 S2 S1 S4 Success rate in recogntion % 100% 50% 75% 50% Performance Rate

  31. Testing Phase of Second Set

  32. Testing Phase of Second Set for 2 speakers

  33. Results for test 1 (When speaker said: ”Twenty”): Speaker 1 matches with speaker 1 Speaker 2 Not Match Speaker 3 matches with speaker 3 Speaker 4 matches with speaker 4 Speaker 5 Not Match Speaker 6 Not Match

  34. Conclusion

More Related