1 / 51

Structure-Based Speech Classification Using Nonlinear Embedding Techniques

This research aims to develop a structure-based speech classification system using nonlinear embedding techniques. The study focuses on voiced and unvoiced speech, usable and unusable speech, and nonlinearities in speech. The proposed research includes feature extraction methods and classification algorithms based on difference-mean comparison and nodal density measures.

maxiner
Download Presentation

Structure-Based Speech Classification Using Nonlinear Embedding Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu Advisor Dr. Robert E. Yantorno Committee Dr. Saroj K. Biswas Dr. Henry M. Sendaula

  2. Acknowledgment • Dr. Robert Yantorno • Dr. Saroj Biswas • Dr. Henry Sendaula • Speech Lab Members • Air Force Research Laboratory, Rome, NY

  3. Overview • Voiced and Unvoiced Speech • Usable and Unusable Speech • Nonlinearities in Speech • Non-Linear Embedding • Research Goal • Proposed Research

  4. Voiced and Unvoiced Speech

  5. Voiced Quasi-periodic excitation Modulation by vocal tract Production of vowels, voiced fricatives & plosives Voiced/Unvoiced Characteristics • Unvoiced • No periodic vibration of vocal chords • Noise-like nature • Production of unvoiced fricatives and plosives

  6. Usable Speech • Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition. • Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments • Target-to-interferer Ratio (TIR) > 20dB

  7. Nonlinearities in Speech • Glottal waveform changes • Shape varies with amplitude • Physical observations • Flow in vocal tract is non-laminar • Coupling between vocal tract and folds • When glottis is open, prominent changes are observed in formant characteristics

  8. Nonlinear Embedding • Nonlinear Systems • Point moving along some trajectory in an abstract state space • Coordinates of the point are independent degrees of freedom of the system • State space could be reconstructed from a scalar signal

  9. Nonlinear Embedding (cont’d) • Takens’ Method of Delays • A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension • Vectors in m-dimensional state space are formed from time-delayed values of a signal

  10. Nonlinear Embedding (cont’d) • m = embedding dimension • d = delay value

  11. Nonlinear Embedding (Cont’d) • Delay value, d: • Dependent on sampling rate and signal properties • Large enough such that nonlinearities are taken into account by the reconstructed trajectory • Small enough to retain reasonable time resolution

  12. Nonlinear Embedding (Cont’d) • Dimension, m: • Generation of voiced speech constitutes a low-dimensional system • Generation of unvoiced speech constitutes a relatively high-dimensional system • Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech

  13. Embedded Voiced and Unvoiced Speech

  14. Embedded Usable and Unusable Speech

  15. Research Goal • Feature Extraction • Difference-Mean Comparison (DMC) Measure • Voiced/unvoiced classification • Nodal Density Measure • Voiced/unvoiced classification • Usable/unusable classification

  16. Difference-Mean Comparison (DMC) Measure Voiced/Unvoiced Classification

  17. Introduction • 3rd order difference computation along first non-singleton dimension • Ist order difference of NxN matrix given by • Length(3rd order diff. > mean) observed

  18. Embedded Voiced and Unvoiced Speech

  19. Difference-Mean Comparison Distribution

  20. Difference-Mean Comparison Distribution

  21. Difference-Mean Comparison Distribution

  22. DMC-Based Decisions

  23. DMC-Based Decisions

  24. DMC-Based Decisions

  25. DMC-Based Decisions

  26. DMC-Based Decisions

  27. DMC-Based Decisions

  28. Results

  29. Results (Cont’d)

  30. Nodal Density Measure Voiced/Unvoiced Classification Usable/Unusable Classification

  31. Introduction • Smallest cube which encloses the signal is determined • This cube is divided into N smaller cubes • Edges of the smaller cubes are defined as nodes • Number of nodes spanned by the signal is determined • Ratio of number of nodes spanned to total number of nodes is defined as nodal density

  32. Voiced/Unvoiced Classification

  33. Embedded Voiced and Unvoiced Speech Frames with Grids

  34. Nodes Spanned by Embedded Voiced and Unvoiced Speech Frames

  35. Nodal-Density Distribution

  36. Nodal-Density Distribution

  37. Nodal-Density Distribution

  38. Filtering • Moving Average Filter • Order, M = 10

  39. Nodal-Density Distributions after Filtering

  40. Nodal-Density Distributions after Filtering

  41. Nodal-Density Distributions After Filtering

  42. Results

  43. Results (Cont’d)

  44. Proposed Research Usable/Unusable Classification

  45. Embedded Usable and Unusable Speech Frames with Grids

  46. Nodes Spanned by Embedded Co-channel Speech of 30dB TIR Nodes Spanned by Embedded Co-channel Speech of 30dB TIR Nodes Spanned by Embedded Co-channel Speech of 30dB TIR 6000 6000 6000 4000 4000 4000 2000 2000 2000 0 0 0 -2000 -2000 -4000 -4000 -2000 -6000 -6000 -4000 5000 5000 5000 5000 5000 0 0 6000 0 0 4000 -5000 -5000 0 2000 -5000 -5000 0 -10000 -10000 -10000 -10000 -2000 -5000 -4000 Nodes Spanned by Embedded Usable and Unusable Speech Frames

  47. Preliminary Results

  48. Difference-Mean Comparison V/UV Classification Nonlinear Embedding Speech Nodal Density V/UV Classification Usable/Unusable Classification Summary

  49. Future Proposed Research • Determine optimum filter for nodal density-based voiced/unvoiced classification • Develop nodal density measure for usable/unusable classification • Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification • Perform decision-level fusion of both features

More Related