A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS - PowerPoint PPT Presentation

Jims
a comparison of commercial speech recognition components for use in police cruisers l.
Skip this Video
Loading SlideShow in 5 Seconds..
A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS PowerPoint Presentation
Download Presentation
A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS

play fullscreen
1 / 48
Download Presentation
A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS
180 Views
Download Presentation

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS 3rd Annual Intelligent Vehicle Systems Symposium Andrew L. Kun Brett Vinciguerra June 11, 2003

  2. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  3. Project54 Overview • UNH / NHSP / DOJ • Integrates • Controls • Standard Interface

  4. Introduction • What was the goal of this research? • Compare SR engine and microphone combinations • Accuracy and efficiency • Quantitatively

  5. Introduction • Why was this research important? • Limit distraction • Limit frustration • Standard Process

  6. Introduction • How was this goal accomplished? • 16 combinations (4 engines x 4 mics) evaluated • Speech Recognition Evaluation Program (SREP) • Simulates • Classifies • Calculates

  7. Introduction • Accuracy • # of correct commands verses total commands • Efficiency • false recognitions • weighted

  8. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results • Discussion • Conclusion

  9. SR ENGINE OPTIONS • Speed of Speech • Discrete • Continuous • Type of Application • Command-and-control • Dictation • User-Dependency • Speaker dependent • Speaker independent • Field of Application • PC • Telephone • Noise robust • Grammar File

  10. Comparing SR Engines • Field test • Simulated tests • Speaker source • Background noise • Number of speakers

  11. Accuracy Ratings • Not consistent • Different conditions • Hyde’s Law • ‘Because speech recognisers have an accuracy of 98%, tests must be arranged to prove it’

  12. Component Requirements • Speech Recognition Engine • Must be SAPI 4.0 • Microphone • Must be far-field • Mountable on dashboard • Cancel noise • Array • Directional

  13. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  14. LOOP ENGINES LOOP BACKGROUND LOOP COMMANDS

  15. Obtaining Sound Files • Laptop w/ SoundBlaster • Earthworks M30BX • Background recorded on patrol • Speech commands in lab • Microsoft Audio Collection Tool • 5 Speakers (4 male, 1 female) • 40 phrases

  16. Processing Sound Files • Matlab script Signal strength = variance(signal) + mean(signal)2 • Set volume and signal-to-noise ratio

  17. Control File Structure • Background Noises • WAV filename • Desired SNR • Signal strength • Description of file • Voice Commands • WAV filename • Number of loops • Signal strength • Phrase

  18. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  19. PRODUCTS TESTED • Four microphones • A, B, C and D. • Four SR engines • 1, 2, 3, and 4. • 16 unique combinations • A1 through D4

  20. SR ENGINES • SR Engine 1 • Microsoft SR Engine 4.0 • SR Engine 2 • Microsoft SR Engine 4.0 • SR Engine 3 • Dragon NaturallySpeaking 4.0 • SR Engine 4 • IBM ViaVoice 8.01

  21. PREPERATION • Freshly installed engines • Minimum training • Default settings • Microphone Set-up Wizard

  22. TEST SCENERIO • Identical conditions • 42 phrase grammar • 10 speech commands • 5 speakers • 6 background noises • 3 SNR levels

  23. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  24. ACCURACY BY ENGINE

  25. ACCURACY BY MIC

  26. RANKED ACCURACY

  27. Efficiency Score • Specific to Project54 • False recognitions

  28. Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 0 LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

  29. Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 1 LIGHTS UNRECOGNIZED LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

  30. Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 1.5 LIGHTS SIREN ON SIREN OFF SIREN OFF LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

  31. Efficiency Score • Scoring system • Correctly recognized = 1.5 • Unrecognised = 0.5 • Falsely recognized = 0 Eff. = ((#correct * 1.5) + (#unrec. * 0.5)) / 13.5 • Extreme scores • All correct => Eff. = 100 • All unrecognised => Eff. = 33 • All falsely recognised => Eff. = 0

  32. RANKED EFFICIENCY

  33. WINNER • Accuracy • Configuration C2 accuracy = 70.3 % • Efficiency • Configuration C2 efficiency = 72.4 • Logical choices • Microphone C • SR Engine 2

  34. WHY LOW ACCURACIES? • Speakers SR experience • Limited training • Training Environment • Default settings • Microphone and speaker placement • SNR • Absolute scores not important

  35. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  36. CONCLUSION • The main goal of this research was • SR engine and microphone combinations • Accuracy and efficiency • Quantitatively

  37. CONCLUSION • This research was important in order to • Limit distraction • Limit frustration

  38. CONCLUSION • The goal was reached by • Evaluating 16 combinations (4 engines x 4 mics) • Speech Recognition Evaluation Program (SREP) • Simulated • Classified • Calculated

  39. CONCLUSION • Configuration C2 • Most accurate • Most efficient SR ENGINE 2 Microsoft SR Engine 4.0 Telephone mode

  40. CURRENT STATUS • 9 vehicles on road • 300 in production • Now support non SAPI 4.0 • Evaluating new engines

  41. MORE INFORMATION • www.project54.unh.edu • andrew.kun@unh.edu • brettv@unh.edu