- 257 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Speech Recognition' - cutler

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Objective - Speech Recognition

- Isolated Word Recognition
- Portable and Fast

Recognition – Conceptually

- Data Acquisition
- Training Hidden Markov Models for word set
- Recognition & Analysis

Theory – Hidden Markov Models

- Used to model

semi-stationary random processes, like speech

- Example:
- cat = / k a t /

Viterbi-based Recognition

- Calculates the log-maximum likelihood of a series of observations given a particular HMM.
- “Which model did this set of data most likely come from?”
- Saves time by calculating only a subset of possible paths through the HMM network.
- At each new frame, only the most likely transition/observation state pairs are used.
- Concepts similar to Dynamic Time Warping

System Components I Volume Box

- Sound Input
- Amplifier
- Reference Voltage
- Resistor network (Voltage Dividers)
- Voltage followers
- Comparator
- Microphone voltage vs. Reference
- Output
- LED bargraph

System Components II Hidden Markov Modeling ToolKit

- Data Acquisition
- Data Preparation
- Parameter Enhancements
- Recognition & Analysis

System Components II (cont.) HTK: Data Acquisition & Preparation

- Data Acquisition
- Recording using HSLab
- Live audio input using HVite
- Data Preparation
- External files: dictionary, config, word lists
- Initialization of prototype models (HCompV)

System Components II (cont.) HTK: Training & Recognition

- HERest – parameter re-estimation and enhancement tool
- Uses information from the energy, delta, & acceleration features in the cepstral domain
- HVite for Recognition
- Recognition of pre-recorded files or live audio input
- A host of external files to support the recognition
- Analysis tool HResults to compute accuracy & correctness results

System Components II (cont.) HTK: Results & Analysis

- HResults
- Computes % values for recognition accuracy and correctness
- Results Analysis
- NREF = percentage of reference labels correctly recognized
- Correction does not penalize for insertion errors

System Components II (cont.) HTK: Preliminary Results

======================

HTK Results Analysis

======================

Date: Mon Sep 30 16:50:59 2002

Ref : 4word_word.mlf

Rec : recout.mlf

------------------------ Overall Results --------------------------

SENT: %Correct=25.00 [H=1, S=3, N=4]

WORD: %Corr=25.00, Acc=25.00 [H=9, D=0, S=3, I=0, N=12]

======================

System Components II (cont.) HTK: Techniques, Solutions

- Input File Specifications
- Config
- Cepstral mean subtraction, energy enormalization
- Prototype model
- Number of states per word model
- “Optimality” in transition probability assignments (matrix)
- Data
- “Noise-free” data
- As many tokens/samples of each word for training

DSP – System Overview

- Initialization
- Threshold/Recording
- MFCC
- Viterbi
- Output

DSP - Matlab

- Prototype of all important algorithms
- Pre-calculated data
- Run-time altering of data (debugging)
- Downloading and visualization of data
- MFCCs

DSP – Recording/Thresholding

- Speech Input
- Process
- Poll A/D for input data (TI-provided code used)
- Take only one channel as input
- Downsample
- Save samples only when signal threshold has been crossed
- Lead buffer
- Tail buffer
- PROBLEMS
- Sample transfer modes, single channel selection, threshold values, external microphones
- TESTING
- Visual and audio inspection in Matlab

DSP – MFCC calculation (1)

- Thank You to Takuya Ooura for his Public Domain FFT code.
- MFCCs provide an uncorrelated and small set of observation vectors for the HMMs
- Process:
- Remove DC gain
- Pre-emphasize
- Hamming window
- FFT magnitude
- Mel-filter bank
- DCT
- Lifter

DSP – MFCC calculation (2)

- PROBLEMS:
- An incorrectly coded pre-emphasis filter
- TESTING:
- Graphically compared DSP generated MFCCs to:
- Matlab MFCCs -> DSP numerical issues
- HTK MFCCs -> reference implementation

DSP – Viterbi/Recognition

- Uses HTK derived HMMs whose data is contained in a Matlab-generated #include file
- PROBLEMS
- Numerical concerns
- Errors in deriving and coding the formulas.

Final Component Results I: HTK

- Pre-recorded Files:

======================

HTK Results Analysis

======================

Date: Mon Dec 02 11:37:46 2002

Ref : testwords.mlf

Rec : testwordsoutput.mlf

------------------------ Overall Results --------------------------

SENT: %Correct=94.85 [H=92, S=5, N=97]

WORD: %Corr=98.28, Acc=98.28 [H=286, D=0, S=5, I=0, N=291]

======================

- Live Audio Input: ~ 83%
- DSP MFCC Files: ~ 65 %

Final Component Results II: DSP

- 95% recognition accuracy over 90 trials
- 4 words
- Trained speaker
- Speaker Independence
- Indication of some recognition for non-modeled speakers, but not much
- Speech => Decision takes approximately 0.88 seconds

Challenges

- Speed
- Complex project
- System integration
- Microphone input
- Volume Box
- HTK
- MATLAB & DSP

Recommendations

- HTK and DSP
- Larger training corpus
- Multiple Gaussian mixtures
- Channel independence
- Continuous Recognition
- Real-time MFCC transmission from DSP to HTK
- DSP
- Code style-fixes
- Better user interface

Thank You

- Dan Block – For use of his lab and equipment

DSP – MFCC calculation

- Thank You to Takuya Ooura for his Public Domain FFT code.
- MFCC’s provide an uncorrelated and small set of observation vectors for the HMM’s
- Process:
- Remove DC gain
- Pre-emphasize
- Hamming window
- FFT magnitude
- Mel-filter bank
- DCT
- Lifter
- PROBLEMS:
- An incorrectly coded pre-emphasis filter
- TESTING:
- Graphically compared DSP generated MFCC’s to:
- Matlab MFCC’s -> DSP numerical issues
- HTK MFCC’s -> reference implementation

Download Presentation

Connecting to Server..