Advanced Speech Processing Techniques Utilizing HTK for Feature Extraction

Speech Processing Using HTK Trevor Bowden 12/08/2008

Outline • Concept of Project • HTK Feature Extraction Capabilities • Details of Feature Extraction Script • Future Development

Concept of Project • Explore HTK Feature Extraction Capabilities • Feature Output Types • Additional Feature Parameters • Ideal Solution • Derive Any Feature Type from Any Corpus

HTK Feature Extraction Models • Linear Prediction Analysis • Cepstral Analysis Hamming Window Hamming Window FFT() Log()

HTK Feature Extraction Capabilities • Feature Extraction Methods • Linear Prediction Analysis • Cepstral Analysis • Mel-Scaling • Perceptual Linear Prediction Analysis • Additional Feature Information • Signal Energy • Derivative Information

Linear Prediction Analysis • Vocal Tract Transfer Function • Transfer Function Coefficients Solution • Autocorrelation Matrices • Autocorrelation of Speech • Amplitude of Model

Cepstral Analysis • Logarithmic Spectral Domain (Cepstral Domain) • Allows for Separation of Convolved Signals

Mel-Scaling • Perception of sound by the human mind is non-linear in that the mind perceives a non-linear scale of pitches to be equally spaced in the frequency domain.

Perceptual Linear Prediction Analysis • Perceptual linear prediction is a combination of both linear prediction and Cepstral analysis. • The spectrum of the speech data is first converted using the Mel scale. • The data is then cubed and linear prediction coefficients are computed. • From these coefficients Cepstral analysis is performed.

Signal Energy and Derivatives • Signal Energy • Delta Coefficients • Acceleration Coefficients • Third Differential Coefficients

Speech Processing of the AMI Corpus • Ideal Solution Yields Generic Feature Types from Generic Corpus • Corpora Have Varying Audio File Types and Varying Organizational Structures • Corpora Have Varying Methods for Annotation

Speech Processing of the AMI Corpus • Project Solution Yields Generic Feature Types from Corpora with Riff Format WAV Audio Files • Two Main Functions of Script • Traverse Corpus Directory Tree • Generate List of Audio Files • Produce Feature Data • Using User-Defined Configuration File

Future Development • Expand Script to Handle Audio Inputs of Any File Type • Include Processing for Specific Corpus Annotations

Advanced Speech Processing Techniques Utilizing HTK for Feature Extraction

Advanced Speech Processing Techniques Utilizing HTK for Feature Extraction

Presentation Transcript

Speech Processing

Speech Processing

Building an ASR using HTK CS4706

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Language model using HTK

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing