160 likes | 291 Views
This project investigates the application of the Eigen-faces method in classifying speech styles, particularly focusing on voice disorders such as stuttering and pausing. It aims to enhance speech recognition techniques through advanced feature extraction and classification methods, utilizing previous successes with neural networks and wavelet transformations. By applying Eigen-faces to audio data, we analyze its effectiveness in distinguishing variations in speech, leading to potential applications in personalized therapy for individuals with Parkinson's disease and other related conditions.
E N D
Project 1 :Eigen-Faces Applied to Speech Style Classification Brad Keserich, Senior, Computer Engineering College of Engineering and Applied Science; University of Cincinnati; Cincinnati, Ohio Suryadip Chakraborty, School of Computing Sciences and Informatics Dr. Dharma Agrawal, Professor, School of Computing Sciences and Informatics • Sponsored ByThe National Science Foundation Grant ID No.: DUE-0756921
Introduction • Speech recognition • Voice disorders • Stuttering • Pausing • Other less known forms • Research group focus on Parkinson’s Patients
Techniques • Previous work • Good results using Neural Network classifiers using Fuzzy values • Wavelet Transformations are effective • For this project • Eigen-faces method adapted to audio
Goals Investigate the usefulness of the eigen-faces method for speech classification
Objectives • Acquire data • Extract salient features • Analyze Eigen-faces effectiveness
Eigen-faces for audio t w3 w4 w5 w1 w2 f1 f2 : wi vi = : : : : fr
Classifiers using Abstract Features • Training • Training set of feature vectors • Convert to Zero-mean truth set • Top k principle components (using principle component analysis (PCA)) • Classifying • Project new vectors onto eigenbasis • Residuals indicate closeness to a class
Data • Recorded word: “Ta-Be-Mo-No” • Consonant + vowel sounds • Easy to do segmentation • Use “Ta” portion only • Use voice acting for data collection • Same person • Vary the way the word is spoken • Variance of speaking style • Stuttering • Pausing • Pace • Pitch inflections
Voice Acting Abstract Features Stutter Detection Audio Recording Software Ground Truthing Segmentation Signal Duration Eigen-faces Speech Detection Pipeline Power Spectrum
Segmentation and Labeling • Automation • Works well for slow clear cases • Not as well for more realistic cases • Slow cases are close to hand segmentation • By Hand • More reliable segmentation at this point • Done with sample counts in Logic 8 • Label the segments with correct sound
Modifications • Use additional features in the Eigen-faces method • Stutter detection • Pauses and spacing within the spoken word • Pitch inflections • Utilize Mel-Cepstrum to pick up features • Substitute Laplacian Eigenmap for PCA
Results • Features performing well • Blatant stutter detection • Long durations • Spectrum analysis • Good class seperability
Conclusions • Eigen-faces work for spoken audio data • More tweaking required • Further research • Mel-Cepstrum features • Laplacian Eigenmapping to replace PCA • May be useful as a front end to Fuzzy-Neuro classifiers
References • Wu, H., Siegel, M., & Khosla, P. (1999). Vehicle sound signature recognition by frequency vector principal component analysis. IEEE Transactions on Instrumentation and Measurement, 48(5) doi: http://dx.doi.org/10.1109/19.799662. • Belkin, M. & Niyogi, P. (2002). Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. • Prahalld, K. Speech Technology: A Practical Introduction Topics: Spectogram, Cepstrum and Mel-Frequency Analysis. http://www.speech.cs.cmu.edu/11-492/slides/03_mfcc.pdf.