Speech recognition
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Speech Recognition PowerPoint PPT Presentation


  • 194 Views
  • Uploaded on
  • Presentation posted in: General

Speech Recognition. Mital Gandhi Brian Romanowski. Objective - Speech Recognition. Isolated Word Recognition Portable and Fast. System Block Diagram. Recognition – Conceptually. Data Acquisition Training Hidden Markov Models for word set Recognition & Analysis.

Download Presentation

Speech Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Speech recognition

Speech Recognition

Mital Gandhi

Brian Romanowski


Objective speech recognition

Objective - Speech Recognition

  • Isolated Word Recognition

  • Portable and Fast


System block diagram

System Block Diagram


Recognition conceptually

Recognition – Conceptually

  • Data Acquisition

  • Training Hidden Markov Models for word set

  • Recognition & Analysis


Theory hidden markov models

Theory – Hidden Markov Models

  • Used to model

    semi-stationary random processes, like speech

  • Example:

    • cat = / k a t /


Viterbi based recognition

Viterbi-based Recognition

  • Calculates the log-maximum likelihood of a series of observations given a particular HMM.

    • “Which model did this set of data most likely come from?”

  • Saves time by calculating only a subset of possible paths through the HMM network.

    • At each new frame, only the most likely transition/observation state pairs are used.

    • Concepts similar to Dynamic Time Warping


System components i volume box

System Components I Volume Box

  • Sound Input

    • Amplifier

  • Reference Voltage

    • Resistor network (Voltage Dividers)

    • Voltage followers

  • Comparator

    • Microphone voltage vs. Reference

  • Output

    • LED bargraph


System components ii hidden markov modeling toolkit

System Components II Hidden Markov Modeling ToolKit

  • Data Acquisition

  • Data Preparation

  • Parameter Enhancements

  • Recognition & Analysis


System components ii cont htk data acquisition preparation

System Components II (cont.) HTK: Data Acquisition & Preparation

  • Data Acquisition

    • Recording using HSLab

    • Live audio input using HVite

  • Data Preparation

    • External files: dictionary, config, word lists

    • Initialization of prototype models (HCompV)


System components ii cont htk sample external files

System Components II (cont.) HTK: Sample External Files

  • Config

  • Prototype Model


System components ii cont htk training recognition

System Components II (cont.) HTK: Training & Recognition

  • HERest – parameter re-estimation and enhancement tool

    • Uses information from the energy, delta, & acceleration features in the cepstral domain

  • HVite for Recognition

    • Recognition of pre-recorded files or live audio input

    • A host of external files to support the recognition

    • Analysis tool HResults to compute accuracy & correctness results


System components ii cont htk results analysis

System Components II (cont.) HTK: Results & Analysis

  • HResults

    • Computes % values for recognition accuracy and correctness

  • Results Analysis

    • NREF = percentage of reference labels correctly recognized

    • Correction does not penalize for insertion errors


System components ii cont htk preliminary results

System Components II (cont.) HTK: Preliminary Results

======================

HTK Results Analysis

======================

Date: Mon Sep 30 16:50:59 2002

Ref : 4word_word.mlf

Rec : recout.mlf

------------------------ Overall Results --------------------------

SENT: %Correct=25.00 [H=1, S=3, N=4]

WORD: %Corr=25.00, Acc=25.00 [H=9, D=0, S=3, I=0, N=12]

======================


System components ii cont htk techniques solutions

System Components II (cont.) HTK: Techniques, Solutions

  • Input File Specifications

    • Config

      • Cepstral mean subtraction, energy enormalization

    • Prototype model

      • Number of states per word model

      • “Optimality” in transition probability assignments (matrix)

    • Data

      • “Noise-free” data

      • As many tokens/samples of each word for training


Dsp system overview

DSP – System Overview

  • Initialization

  • Threshold/Recording

  • MFCC

  • Viterbi

  • Output


Dsp matlab

DSP - Matlab

  • Prototype of all important algorithms

  • Pre-calculated data

  • Run-time altering of data (debugging)

  • Downloading and visualization of data

    • MFCCs


Dsp recording thresholding

DSP – Recording/Thresholding

  • Speech Input

  • Process

    • Poll A/D for input data (TI-provided code used)

    • Take only one channel as input

    • Downsample

    • Save samples only when signal threshold has been crossed

      • Lead buffer

      • Tail buffer

  • PROBLEMS

    • Sample transfer modes, single channel selection, threshold values, external microphones

  • TESTING

    • Visual and audio inspection in Matlab


Dsp mfcc calculation 1

DSP – MFCC calculation (1)

  • Thank You to Takuya Ooura for his Public Domain FFT code.

  • MFCCs provide an uncorrelated and small set of observation vectors for the HMMs

  • Process:

    • Remove DC gain

    • Pre-emphasize

    • Hamming window

    • FFT magnitude

    • Mel-filter bank

    • DCT

    • Lifter


Dsp mfcc calculation 2

DSP – MFCC calculation (2)

  • PROBLEMS:

    • An incorrectly coded pre-emphasis filter

  • TESTING:

    • Graphically compared DSP generated MFCCs to:

      • Matlab MFCCs -> DSP numerical issues

      • HTK MFCCs -> reference implementation


Dsp viterbi recognition

DSP – Viterbi/Recognition

  • Uses HTK derived HMMs whose data is contained in a Matlab-generated #include file

  • PROBLEMS

    • Numerical concerns

    • Errors in deriving and coding the formulas.


Final component results i htk

Final Component Results I: HTK

  • Pre-recorded Files:

    ======================

    HTK Results Analysis

    ======================

    Date: Mon Dec 02 11:37:46 2002

    Ref : testwords.mlf

    Rec : testwordsoutput.mlf

    ------------------------ Overall Results --------------------------

    SENT: %Correct=94.85 [H=92, S=5, N=97]

    WORD: %Corr=98.28, Acc=98.28 [H=286, D=0, S=5, I=0, N=291]

    ======================

  • Live Audio Input: ~ 83%

  • DSP MFCC Files: ~ 65 %


Final component results ii dsp

Final Component Results II: DSP

  • 95% recognition accuracy over 90 trials

    • 4 words

    • Trained speaker

  • Speaker Independence

    • Indication of some recognition for non-modeled speakers, but not much

  • Speech => Decision takes approximately 0.88 seconds


Challenges

Challenges

  • Speed

  • Complex project

  • System integration

    • Microphone input

    • Volume Box

    • HTK

    • MATLAB & DSP


Recommendations

Recommendations

  • HTK and DSP

    • Larger training corpus

    • Multiple Gaussian mixtures

    • Channel independence

    • Continuous Recognition

    • Real-time MFCC transmission from DSP to HTK

  • DSP

    • Code style-fixes

    • Better user interface


Thank you

Thank You

  • Dan Block – For use of his lab and equipment


Dsp mfcc calculation

DSP – MFCC calculation

  • Thank You to Takuya Ooura for his Public Domain FFT code.

  • MFCC’s provide an uncorrelated and small set of observation vectors for the HMM’s

  • Process:

    • Remove DC gain

    • Pre-emphasize

    • Hamming window

    • FFT magnitude

    • Mel-filter bank

    • DCT

    • Lifter

  • PROBLEMS:

    • An incorrectly coded pre-emphasis filter

  • TESTING:

    • Graphically compared DSP generated MFCC’s to:

      • Matlab MFCC’s -> DSP numerical issues

      • HTK MFCC’s -> reference implementation


  • Login