1 / 21

Improved ASR in noise using harmonic decomposition

Improved ASR in noise using harmonic decomposition. Production of /z/:. Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion. aperiodic contribution. periodic contribution. Introduction. Motivation & Aims.

morrie
Download Presentation

Improved ASR in noise using harmonic decomposition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved ASR in noise using harmonic decomposition Production of /z/: • Introduction • Pitch-Scaled Harmonic Filter • Recognition Experiments • Results • Conclusion aperiodic contribution periodic contribution

  2. Introduction Motivation & Aims • Most speech sounds are predominantly voiced or unvoiced. • What happens when the two components are “mixed”? • Voiced and unvoiced components have different natures: • unvoiced: aperiodic signal from turbulence-noise sources • voiced: quasi-periodic signal from vocal-fold vibration • Why not extract their features separately? • Do the two contributions contain complementary information? • Human speech recognition still performs well in noise. • How? Does it take advantage of harmonic properties?

  3. Introduction Voiced and unvoiced parts of a speech signal Production of /z/: periodic contribution aperiodic contribution

  4. Introduction Automatic Speech Recognition speech signal Pattern Recognition speech labels Front End Feature Extraction: conversion of speech signals to a sequence of parameter vectors Dynamic Programming: matching of observation sequences to models of known utterances

  5. PSHF raw pitch Pitch optimisation optimised pitch Nopt window window w(n) w(n) ^ ^ vw(n) uw(n) sw(n) u(n) v(n) aperiodic waveform periodic waveform PSHF block diagram f0raw f0opt wave-form Harmonic Decomposition s(n) _ +

  6. PSHF Decomposition example (waveforms) Original Periodic part Aperiodic part

  7. PSHF Decomposition example (spectrograms) Original Periodic part Aperiodic part

  8. PSHF Decomposition example (MFCC specs.) Original Periodic part Aperiodic part

  9. Method waveform features BASE: MFCC +Δ, +Δ2 SPLIT: MFCC +Δ, +Δ2 PSHF cat PCA26: MFCC PSHF +Δ, +Δ2 cat PCA PCA78: MFCC +Δ, +Δ2 PSHF cat PCA PCA13: MFCC +Δ, +Δ2 PSHF cat PCA PCA39: MFCC +Δ, +Δ2 PSHF cat PCA Parameterisations

  10. Method Speech Database: Aurora 2.0 • TIdigits database at 8 kHz, filtered with G.712 channel • Connected English digit strings (male & female speakers)

  11. Method Description of the experiments • Baseline experiment: [base] • standard parameterisation of the original waveforms (i.e., MFCC+D+A) • Split experiments: [split] • adjustment of stream weights (voiced vs. unvoiced) • PCA experiments: [pca26, pca78, pca13 and pca39] • decorrelation of the feature vectors, and reduction of the number of coefficients

  12. Results Split experiments results

  13. Results Split experiments results

  14. Results Split experiments results

  15. Results Summary of results

  16. Conclusions • PSHF module split Aurora’s speech waveforms into two synchronous streams (periodic and aperiodic). • Used separately, accuracy was slighty degraded, however together, it was substantially increased in noisy conditions. • Periodic speech segments provide robustness to noise. Further Work • Apply Linear Discriminant Analysis (LDA) to the two-stream feature vector. • Evaluate the performance of this front end in a more general task, such as phoneme recognition. • Test the technique for speaker recognition.

  17. COLUMBO PROJECT: Harmonic Decomposition applied to ASR http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ David M. Moreno 1 <davidm@talp.upc.es> Philip J.B. Jackson 2<p.jackson@surrey.ac.uk> Javier Hernando 1 <javier@talp.upc.es> Martin J. Russell 3 <m.j.russell@bham.ac.uk> 1 2 3

  18. Pitch Optimisation: vowel /u/ Cost function Spectrum derived from a 268-point DFT

  19. Harmonic Decomposition: vowel /u/

  20. Word accuracy results (%)

  21. Observation probability, with stream weights

More Related