- 236 Views
- Uploaded on
- Presentation posted in: General

Speech Recognition

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Speech Recognition

Mital Gandhi

Brian Romanowski

- Isolated Word Recognition
- Portable and Fast

- Data Acquisition
- Training Hidden Markov Models for word set
- Recognition & Analysis

- Used to model
semi-stationary random processes, like speech

- Example:
- cat = / k a t /

- Calculates the log-maximum likelihood of a series of observations given a particular HMM.
- “Which model did this set of data most likely come from?”

- Saves time by calculating only a subset of possible paths through the HMM network.
- At each new frame, only the most likely transition/observation state pairs are used.
- Concepts similar to Dynamic Time Warping

- Sound Input
- Amplifier

- Reference Voltage
- Resistor network (Voltage Dividers)
- Voltage followers

- Comparator
- Microphone voltage vs. Reference

- Output
- LED bargraph

- Data Acquisition
- Data Preparation
- Parameter Enhancements
- Recognition & Analysis

- Data Acquisition
- Recording using HSLab
- Live audio input using HVite

- Data Preparation
- External files: dictionary, config, word lists
- Initialization of prototype models (HCompV)

- Config

- Prototype Model

- HERest – parameter re-estimation and enhancement tool
- Uses information from the energy, delta, & acceleration features in the cepstral domain

- HVite for Recognition
- Recognition of pre-recorded files or live audio input
- A host of external files to support the recognition
- Analysis tool HResults to compute accuracy & correctness results

- HResults
- Computes % values for recognition accuracy and correctness

- Results Analysis
- NREF = percentage of reference labels correctly recognized
- Correction does not penalize for insertion errors

======================

HTK Results Analysis

======================

Date: Mon Sep 30 16:50:59 2002

Ref : 4word_word.mlf

Rec : recout.mlf

------------------------ Overall Results --------------------------

SENT: %Correct=25.00 [H=1, S=3, N=4]

WORD: %Corr=25.00, Acc=25.00 [H=9, D=0, S=3, I=0, N=12]

======================

- Input File Specifications
- Config
- Cepstral mean subtraction, energy enormalization

- Prototype model
- Number of states per word model
- “Optimality” in transition probability assignments (matrix)

- Data
- “Noise-free” data
- As many tokens/samples of each word for training

- Config

- Initialization
- Threshold/Recording
- MFCC
- Viterbi
- Output

- Prototype of all important algorithms
- Pre-calculated data
- Run-time altering of data (debugging)
- Downloading and visualization of data
- MFCCs

- Speech Input
- Process
- Poll A/D for input data (TI-provided code used)
- Take only one channel as input
- Downsample
- Save samples only when signal threshold has been crossed
- Lead buffer
- Tail buffer

- PROBLEMS
- Sample transfer modes, single channel selection, threshold values, external microphones

- TESTING
- Visual and audio inspection in Matlab

- Thank You to Takuya Ooura for his Public Domain FFT code.
- MFCCs provide an uncorrelated and small set of observation vectors for the HMMs
- Process:
- Remove DC gain
- Pre-emphasize
- Hamming window
- FFT magnitude
- Mel-filter bank
- DCT
- Lifter

- PROBLEMS:
- An incorrectly coded pre-emphasis filter

- TESTING:
- Graphically compared DSP generated MFCCs to:
- Matlab MFCCs -> DSP numerical issues
- HTK MFCCs -> reference implementation

- Graphically compared DSP generated MFCCs to:

- Uses HTK derived HMMs whose data is contained in a Matlab-generated #include file
- PROBLEMS
- Numerical concerns
- Errors in deriving and coding the formulas.

- Pre-recorded Files:
======================

HTK Results Analysis

======================

Date: Mon Dec 02 11:37:46 2002

Ref : testwords.mlf

Rec : testwordsoutput.mlf

------------------------ Overall Results --------------------------

SENT: %Correct=94.85 [H=92, S=5, N=97]

WORD: %Corr=98.28, Acc=98.28 [H=286, D=0, S=5, I=0, N=291]

======================

- Live Audio Input: ~ 83%
- DSP MFCC Files: ~ 65 %

- 95% recognition accuracy over 90 trials
- 4 words
- Trained speaker

- Speaker Independence
- Indication of some recognition for non-modeled speakers, but not much

- Speech => Decision takes approximately 0.88 seconds

- Speed
- Complex project
- System integration
- Microphone input
- Volume Box
- HTK
- MATLAB & DSP

- HTK and DSP
- Larger training corpus
- Multiple Gaussian mixtures
- Channel independence
- Continuous Recognition
- Real-time MFCC transmission from DSP to HTK

- DSP
- Code style-fixes
- Better user interface

- Dan Block – For use of his lab and equipment

- Thank You to Takuya Ooura for his Public Domain FFT code.
- MFCC’s provide an uncorrelated and small set of observation vectors for the HMM’s
- Process:
- Remove DC gain
- Pre-emphasize
- Hamming window
- FFT magnitude
- Mel-filter bank
- DCT
- Lifter

- PROBLEMS:
- An incorrectly coded pre-emphasis filter

- TESTING:
- Graphically compared DSP generated MFCC’s to:
- Matlab MFCC’s -> DSP numerical issues
- HTK MFCC’s -> reference implementation

- Graphically compared DSP generated MFCC’s to: