ASR Front End Processing

ASR Front End Processing Implemented on Texas Instruments OMAP-L137 Jacob Zurasky – 12/12/11

Project Goals • Create a front-end for embedded ASR • Extract feature vectors from speech data • Allow for many different specifications • Extract features real-time, while allowing enough CPU time for analysis

Hardware Platform • Texas Instruments OMAP – L137 DSP, dual core • TMS320C6747 • ARM9 • AIC3106 Audio Codec • 64MB SDRAM

Signal Flow Block Diagram Audio Framing Pre-Emphasis Window FFT Mel Filter Log DCT Deltas 13 - MFCCs 13 - Deltas 13 - Delta Deltas

Data Streams • Streams are a way to transfer blocks of data efficiently • Uses enhanced direct memory access (EDMA) • Block of data can be accessed by SIO_reclaim(…) • Block of data can be sent by SIO_issue(…) Input Stream DSP Audio Codec Output Stream

Stream Example • After SIO_reclaim, pIn points to input data and pOut points to output data • After SIO_issue, those buffers are reused by the audio codec

Pre-Emphasis • y[n] = x[n] – ax[n-1] • First order high-pass filter • Used to compensate for the higher frequency roll-off in human speech production

Windowing Function • Rectangular, Hann, Hamming, Cosine, Gaussian… Hamming Window

FFT • Magnitude of Frequency Spectrum • Texas Instrument’s DSPLIB for C67x

Mel Filter • Triangular Bandpass Filters along Mel Frequency Scale • Mimics the logarithmic nature of human hearing

Discrete Cosine Transform (DCT) • Transforms back from frequency domain • Typically first 12 values are used as the Mel Frequency Cepstral Coefficients • Look-up table for efficiency

Deltas • Produce 13 MFCC’s per frame • 13 more from the first derivative • 13 additional from the second derivative • 39 dimensional vector to represent the current frame

Observations • Pre-Emphasis and Windowing an input frame Input Frame Pre-Emphasis and Windowed Frame

Observations • FFT and Log, Mel Filter Magnitude of Frequency Spectrum Log, Mel Filtered Spectrum

Observations • Discrete Cosine Transform to produce MFCC’s Mel Frequency Cepstral Coefficients Full Feature Vector for 1 frame

Observations • Frame Size = 256 samples @ 16 kHz Fs • 1 Frame = 16 mS • Feature Extraction Time • Debug – 1.55 mS • Release – 0.25 mS • Real Time Feature Extraction • 0.25 mS / 16 mS = 1.56% usuage

Future Goals • Complete training code for DSP • Load training data to SDRAM • DSP calculates all feature vectors associated with a given phone • Calculates Gaussian mixture model • Save acoustic model off-chip • Evaluate the acoustic model (digital recognition) • Complete embedded ASR on limited vocabulary

ASR Front End Processing

ASR Front End Processing

Presentation Transcript

ITRS 2003 Front End Processing Challenges

PBA Front-End

PBA Front-End

Front End Processing Working Group

PBA Front-End

COLD FRONT END

Front End Slides

PBA Front-End

FRONT-END

Innovative Front-End Signal Processing

Initial Inputs: Adaptive Front-End Signal Processing

PLC Front-end

Front-end node

Front-end characteristics

Karo Front End

Signal Processing Front End

7.0 Speech Signals and Front-end Processing

FRONT-END

Chapter 3 : Processing on the Front End

Semiconductor Processing (front-end)

Chapter 3 : Processing on the Front End

Front-end Electronics and Signal Processing - I