1 / 14

Keyword Spotting Dynamic Time Warping

Keyword Spotting Dynamic Time Warping. Ali Akbar Jabini Alexandre Mercier-Dalphond Spring 2006. Introduction. Speech recognition: Computer can interpret speech Need input to digitalize sounds Microphone People can speak faster than type Commercial systems available since 1990s

sarai
Download Presentation

Keyword Spotting Dynamic Time Warping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keyword SpottingDynamic Time Warping Ali Akbar Jabini Alexandre Mercier-Dalphond Spring 2006

  2. Introduction • Speech recognition: • Computer can interpret speech • Need input to digitalize sounds • Microphone • People can speak faster than type • Commercial systems available since 1990s • People prefer Physical interactions • Keyboard/Mouse, On/Off switch • Low Accuracy for large vocabulary with noise (50%)

  3. Introduction • Speech recognition is more and more used for smaller vocabulary banks • Credit Card Systems • Simple switching commands • Directory assistance • Cheap to implement • High Accuracy • Can verify their interpretation • Idea: speech recognition for household appliances

  4. OUTLINE • Area of investigation • Concrete task/Goal • Schematic • Feature extraction • DTW • Training • Evaluation metrics • Conclusion

  5. Area of Investigation • Keyword Spotting: • Subfield of speech recognition • Grammar constrained • Keyword Spotting in isolated word recognition • Keywords utterances • Keyword separated by silence • Main technique is DTW

  6. Concrete task/Goal • Goal: develop a robust speaker independent keyword spotting scheme to operate household appliances • Concrete tasks • Digitalize the sound inputs • Implementation in MatLab • Train the model with the grammar • Analyze the performances of our scheme

  7. Schematic Microphone A/D Feature extraction DTW Output Grammar

  8. Feature extraction • Pre-emphasis • Flattening the spectrum of the signal • Blocking into frames • Length of the Fourier Transform • Windowing • Sample window (maybe Hamming) • Mel frequency Cepstral coefficients • More reliable than LPC coefficients • This will be imputed in the DTW algorithm

  9. DTW • Idea: smallest distance between an input and the training bank • Cepstrum features • Dynamic programming: the time axis his not linear to account for utterances • t0 -> t0+5 • t1 -> t1-2

  10. DTW

  11. DTW

  12. Training • Need to create our own grammar • On: Onnn, Honnn, open, opeeenn • Off: Hooofff, Hoff, offfff, close • As many potential utterances as possible • Use this data with DTW

  13. Evaluation metrics • Accuracy • High noise • Low noise • Independent speaker • Training data speaker • Would like to obtain 80% or more

  14. Conclusion • Early stage • No code implemented yet • Many challenges a head • Our methodology may change slightly • There is a big potential market for such technique -> influence on every day life.

More Related