A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS 3rd Annual Intelligent Vehicle Systems Symposium Andrew L. Kun Brett Vinciguerra June 11, 2003

Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

Project54 Overview • UNH / NHSP / DOJ • Integrates • Controls • Standard Interface

Introduction • What was the goal of this research? • Compare SR engine and microphone combinations • Accuracy and efficiency • Quantitatively

Introduction • Why was this research important? • Limit distraction • Limit frustration • Standard Process

Introduction • How was this goal accomplished? • 16 combinations (4 engines x 4 mics) evaluated • Speech Recognition Evaluation Program (SREP) • Simulates • Classifies • Calculates

Introduction • Accuracy • # of correct commands verses total commands • Efficiency • false recognitions • weighted

Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results • Discussion • Conclusion

SR ENGINE OPTIONS • Speed of Speech • Discrete • Continuous • Type of Application • Command-and-control • Dictation • User-Dependency • Speaker dependent • Speaker independent • Field of Application • PC • Telephone • Noise robust • Grammar File

Comparing SR Engines • Field test • Simulated tests • Speaker source • Background noise • Number of speakers

Accuracy Ratings • Not consistent • Different conditions • Hyde’s Law • ‘Because speech recognisers have an accuracy of 98%, tests must be arranged to prove it’

Component Requirements • Speech Recognition Engine • Must be SAPI 4.0 • Microphone • Must be far-field • Mountable on dashboard • Cancel noise • Array • Directional

LOOP ENGINES LOOP BACKGROUND LOOP COMMANDS

Obtaining Sound Files • Laptop w/ SoundBlaster • Earthworks M30BX • Background recorded on patrol • Speech commands in lab • Microsoft Audio Collection Tool • 5 Speakers (4 male, 1 female) • 40 phrases

Processing Sound Files • Matlab script Signal strength = variance(signal) + mean(signal)2 • Set volume and signal-to-noise ratio

Control File Structure • Background Noises • WAV filename • Desired SNR • Signal strength • Description of file • Voice Commands • WAV filename • Number of loops • Signal strength • Phrase

PRODUCTS TESTED • Four microphones • A, B, C and D. • Four SR engines • 1, 2, 3, and 4. • 16 unique combinations • A1 through D4

SR ENGINES • SR Engine 1 • Microsoft SR Engine 4.0 • SR Engine 2 • Microsoft SR Engine 4.0 • SR Engine 3 • Dragon NaturallySpeaking 4.0 • SR Engine 4 • IBM ViaVoice 8.01

PREPERATION • Freshly installed engines • Minimum training • Default settings • Microphone Set-up Wizard

TEST SCENERIO • Identical conditions • 42 phrase grammar • 10 speech commands • 5 speakers • 6 background noises • 3 SNR levels

ACCURACY BY ENGINE

ACCURACY BY MIC

RANKED ACCURACY

Efficiency Score • Specific to Project54 • False recognitions

Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 0 LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 1 LIGHTS UNRECOGNIZED LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 1.5 LIGHTS SIREN ON SIREN OFF SIREN OFF LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

Efficiency Score • Scoring system • Correctly recognized = 1.5 • Unrecognised = 0.5 • Falsely recognized = 0 Eff. = ((#correct * 1.5) + (#unrec. * 0.5)) / 13.5 • Extreme scores • All correct => Eff. = 100 • All unrecognised => Eff. = 33 • All falsely recognised => Eff. = 0

RANKED EFFICIENCY

WINNER • Accuracy • Configuration C2 accuracy = 70.3 % • Efficiency • Configuration C2 efficiency = 72.4 • Logical choices • Microphone C • SR Engine 2

WHY LOW ACCURACIES? • Speakers SR experience • Limited training • Training Environment • Default settings • Microphone and speaker placement • SNR • Absolute scores not important

CONCLUSION • The main goal of this research was • SR engine and microphone combinations • Accuracy and efficiency • Quantitatively

CONCLUSION • This research was important in order to • Limit distraction • Limit frustration

CONCLUSION • The goal was reached by • Evaluating 16 combinations (4 engines x 4 mics) • Speech Recognition Evaluation Program (SREP) • Simulated • Classified • Calculated

CONCLUSION • Configuration C2 • Most accurate • Most efficient SR ENGINE 2 Microsoft SR Engine 4.0 Telephone mode

CURRENT STATUS • 9 vehicles on road • 300 in production • Now support non SAPI 4.0 • Evaluating new engines

MORE INFORMATION • www.project54.unh.edu • andrew.kun@unh.edu • brettv@unh.edu

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS

Presentation Transcript

Speech Recognition

Using Speech Recognition for Speech Therapy

Describe the purpose, components, and use of speech recognition systems

A Recognition Model for Speech Coding

Speech Recognition

Speech recognition

Combining Speech Attributes for Speech Recognition

Speech Recognition

Speech Recognition

DTW for Speech Recognition

Speech Recognition

Speech Recognition

Facilitating Use of Speech Recognition Software

Speech Recognition

SPEECH RECOGNITION:

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition for Dummies

Speech Recognition

Speech Recognition