speech recognition
Download
Skip this Video
Download Presentation
Speech Recognition

Loading in 2 Seconds...

play fullscreen
1 / 26

Speech Recognition - PowerPoint PPT Presentation


  • 257 Views
  • Uploaded on

Speech Recognition. Mital Gandhi Brian Romanowski. Objective - Speech Recognition. Isolated Word Recognition Portable and Fast. System Block Diagram. Recognition – Conceptually. Data Acquisition Training Hidden Markov Models for word set Recognition & Analysis.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speech Recognition' - cutler


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speech recognition

Speech Recognition

Mital Gandhi

Brian Romanowski

objective speech recognition
Objective - Speech Recognition
  • Isolated Word Recognition
  • Portable and Fast
recognition conceptually
Recognition – Conceptually
  • Data Acquisition
  • Training Hidden Markov Models for word set
  • Recognition & Analysis
theory hidden markov models
Theory – Hidden Markov Models
  • Used to model

semi-stationary random processes, like speech

  • Example:
    • cat = / k a t /
viterbi based recognition
Viterbi-based Recognition
  • Calculates the log-maximum likelihood of a series of observations given a particular HMM.
    • “Which model did this set of data most likely come from?”
  • Saves time by calculating only a subset of possible paths through the HMM network.
    • At each new frame, only the most likely transition/observation state pairs are used.
    • Concepts similar to Dynamic Time Warping
system components i volume box
System Components I Volume Box
  • Sound Input
    • Amplifier
  • Reference Voltage
    • Resistor network (Voltage Dividers)
    • Voltage followers
  • Comparator
    • Microphone voltage vs. Reference
  • Output
    • LED bargraph
system components ii hidden markov modeling toolkit
System Components II Hidden Markov Modeling ToolKit
  • Data Acquisition
  • Data Preparation
  • Parameter Enhancements
  • Recognition & Analysis
system components ii cont htk data acquisition preparation
System Components II (cont.) HTK: Data Acquisition & Preparation
  • Data Acquisition
    • Recording using HSLab
    • Live audio input using HVite
  • Data Preparation
    • External files: dictionary, config, word lists
    • Initialization of prototype models (HCompV)
system components ii cont htk training recognition
System Components II (cont.) HTK: Training & Recognition
  • HERest – parameter re-estimation and enhancement tool
    • Uses information from the energy, delta, & acceleration features in the cepstral domain
  • HVite for Recognition
    • Recognition of pre-recorded files or live audio input
    • A host of external files to support the recognition
    • Analysis tool HResults to compute accuracy & correctness results
system components ii cont htk results analysis
System Components II (cont.) HTK: Results & Analysis
  • HResults
    • Computes % values for recognition accuracy and correctness
  • Results Analysis
    • NREF = percentage of reference labels correctly recognized
    • Correction does not penalize for insertion errors
system components ii cont htk preliminary results
System Components II (cont.) HTK: Preliminary Results

======================

HTK Results Analysis

======================

Date: Mon Sep 30 16:50:59 2002

Ref : 4word_word.mlf

Rec : recout.mlf

------------------------ Overall Results --------------------------

SENT: %Correct=25.00 [H=1, S=3, N=4]

WORD: %Corr=25.00, Acc=25.00 [H=9, D=0, S=3, I=0, N=12]

======================

system components ii cont htk techniques solutions
System Components II (cont.) HTK: Techniques, Solutions
  • Input File Specifications
    • Config
      • Cepstral mean subtraction, energy enormalization
    • Prototype model
      • Number of states per word model
      • “Optimality” in transition probability assignments (matrix)
    • Data
      • “Noise-free” data
      • As many tokens/samples of each word for training
dsp system overview
DSP – System Overview
  • Initialization
  • Threshold/Recording
  • MFCC
  • Viterbi
  • Output
dsp matlab
DSP - Matlab
  • Prototype of all important algorithms
  • Pre-calculated data
  • Run-time altering of data (debugging)
  • Downloading and visualization of data
    • MFCCs
dsp recording thresholding
DSP – Recording/Thresholding
  • Speech Input
  • Process
    • Poll A/D for input data (TI-provided code used)
    • Take only one channel as input
    • Downsample
    • Save samples only when signal threshold has been crossed
      • Lead buffer
      • Tail buffer
  • PROBLEMS
    • Sample transfer modes, single channel selection, threshold values, external microphones
  • TESTING
    • Visual and audio inspection in Matlab
dsp mfcc calculation 1
DSP – MFCC calculation (1)
  • Thank You to Takuya Ooura for his Public Domain FFT code.
  • MFCCs provide an uncorrelated and small set of observation vectors for the HMMs
  • Process:
    • Remove DC gain
    • Pre-emphasize
    • Hamming window
    • FFT magnitude
    • Mel-filter bank
    • DCT
    • Lifter
dsp mfcc calculation 2
DSP – MFCC calculation (2)
  • PROBLEMS:
    • An incorrectly coded pre-emphasis filter
  • TESTING:
    • Graphically compared DSP generated MFCCs to:
      • Matlab MFCCs -> DSP numerical issues
      • HTK MFCCs -> reference implementation
dsp viterbi recognition
DSP – Viterbi/Recognition
  • Uses HTK derived HMMs whose data is contained in a Matlab-generated #include file
  • PROBLEMS
    • Numerical concerns
    • Errors in deriving and coding the formulas.
final component results i htk
Final Component Results I: HTK
  • Pre-recorded Files:

======================

HTK Results Analysis

======================

Date: Mon Dec 02 11:37:46 2002

Ref : testwords.mlf

Rec : testwordsoutput.mlf

------------------------ Overall Results --------------------------

SENT: %Correct=94.85 [H=92, S=5, N=97]

WORD: %Corr=98.28, Acc=98.28 [H=286, D=0, S=5, I=0, N=291]

======================

  • Live Audio Input: ~ 83%
  • DSP MFCC Files: ~ 65 %
final component results ii dsp
Final Component Results II: DSP
  • 95% recognition accuracy over 90 trials
    • 4 words
    • Trained speaker
  • Speaker Independence
    • Indication of some recognition for non-modeled speakers, but not much
  • Speech => Decision takes approximately 0.88 seconds
challenges
Challenges
  • Speed
  • Complex project
  • System integration
    • Microphone input
    • Volume Box
    • HTK
    • MATLAB & DSP
recommendations
Recommendations
  • HTK and DSP
    • Larger training corpus
    • Multiple Gaussian mixtures
    • Channel independence
    • Continuous Recognition
    • Real-time MFCC transmission from DSP to HTK
  • DSP
    • Code style-fixes
    • Better user interface
thank you
Thank You
  • Dan Block – For use of his lab and equipment
dsp mfcc calculation
DSP – MFCC calculation
  • Thank You to Takuya Ooura for his Public Domain FFT code.
  • MFCC’s provide an uncorrelated and small set of observation vectors for the HMM’s
  • Process:
    • Remove DC gain
    • Pre-emphasize
    • Hamming window
    • FFT magnitude
    • Mel-filter bank
    • DCT
    • Lifter
  • PROBLEMS:
    • An incorrectly coded pre-emphasis filter
  • TESTING:
    • Graphically compared DSP generated MFCC’s to:
      • Matlab MFCC’s -> DSP numerical issues
      • HTK MFCC’s -> reference implementation
ad