covariation and weighting of harmonically decomposed streams for asr n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Covariation and weighting of harmonically decomposed streams for ASR PowerPoint Presentation
Download Presentation
Covariation and weighting of harmonically decomposed streams for ASR

Loading in 2 Seconds...

play fullscreen
1 / 20

Covariation and weighting of harmonically decomposed streams for ASR - PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on

Covariation and weighting of harmonically decomposed streams for ASR. Introduction Pitch-scaled harmonic filter Recognition experiments Results Conclusion. Production of /z/:. periodic. aperiodic. Motivation and aims.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Covariation and weighting of harmonically decomposed streams for ASR' - huong


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
covariation and weighting of harmonically decomposed streams for asr
Covariation and weighting of harmonically decomposed streams for ASR
  • Introduction
  • Pitch-scaled harmonic filter
  • Recognition experiments
  • Results
  • Conclusion

Production of /z/:

periodic

aperiodic

motivation and aims
Motivation and aims
  • Most speech sounds are either voiced or unvoiced, which have very different properties:
    • voiced: quasi-periodic signal from phonation
    • unvoiced: aperiodic signal from turbulence noise
  • Do these properties allow humans to recognize speech in noise?

Maybe, we can use this information to help ASR...

by computing separate features for the two parts.

  • Are their two contributions complementary?

INTRODUCTION

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

voiced and unvoiced parts of a speech signal
Voiced and unvoiced parts of a speech signal

Production of /z/:

periodic contribution

aperiodic contribution

INTRODUCTION

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

pitch scaled harmonic filter

speech waveform

pitch extraction

optimised pitch

f0raw

Nopt

re-splicing

pitch optimisation

f0opt

^

^

u(n)

v(n)

Pitch-scaled harmonic filter

s(n)

time shifting

. . .

PSHF

PSHF

PSHF

aperiodic waveform

periodic waveform

METHOD

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

decomposition example waveforms
Decomposition example (waveforms)

Original

Periodic

Aperiodic

METHOD

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

decomposition ex spectrograms
Decomposition ex. (spectrograms)

Original

Periodic

Aperiodic

METHOD

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

decomposition ex mfcc specs
Decomposition ex. (MFCC specs.)

Original

Periodic

Aperiodic

METHOD

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

speech database aurora 2 0
Speech database: Aurora 2.0
  • From TIdigits database of connected English digit strings (male & female speakers), filtered with G.712 at 8 kHz.

TRAIN

TEST

METHOD

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

description of the experiments
Description of the experiments
  • Baseline experiment: [base]
    • standard parameterisation of the original waveforms (i.e., MFCC,+Δ,+ΔΔ)
  • PCA experiments: [pca26, pca78, pca13 and pca39]
    • decorrelation of the feature vectors, and reduction of the number of coefficients
  • Split experiments: [split, split1]
    • adjustment of stream weights (periodic vs. aperiodic)

Caveat: pitch values were derived from clean speech files, for entire database!

METHOD

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

parameterisations

waveform

features

BASE:

MFCC

+Δ, +Δ2

SPLIT1:

SPLIT:

MFCC

MFCC

+Δ, +Δ2

+Δ, +Δ2

PSHF

PSHF

cat

cat

PCA26:

MFCC

PSHF

+Δ, +Δ2

cat

PCA

PCA78:

MFCC

+Δ, +Δ2

PSHF

cat

PCA

PCA13:

MFCC

+Δ, +Δ2

PSHF

cat

PCA

PCA39:

MFCC

+Δ, +Δ2

PSHF

cat

PCA

Parameterisations

METHOD

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

full sized pca results
Full-sized PCA results

RESULTS

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

variance of principal components
Variance of Principal Components

PCA39

PCA26

• clean

+ multi

RESULTS

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

summary of best pca results
Summary of best PCA results

RESULTS

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

sample split results
Sample Split results

Note: same value of stream weights used in training as in testing, for Split.

RESULTS

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

summary of pca split results
Summary of PCA & Split results

RESULTS

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

conclusions
Conclusions
  • PSHF module split Aurora’s speech waveforms into two synchronous streams (periodic and aperiodic)
    • large improvements over the single-stream Baseline
  • Split was better than all PCA combinations:
    • PCA26/13 better than PCA 78/39, and PCA13 best
    • Split1 marginally better than Split
  • Periodic speech segments give robustness to noise.
  • Further work
    • Modeling: how best to combine the streams?
    • LVCSR: evaluate front end on TIMIT (phone recognition).
    • Robust pitch tracking

CONCLUSION

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

slide20

COLUMBO PROJECT:

Harmonic decomposition applied to ASR

Philip J.B. Jackson 1 <p.jackson@surrey.ac.uk>

David M. Moreno 2 <davidm@talp.upc.es>

Javier Hernando 2 <javier@talp.upc.es>

Martin J. Russell 3 <m.j.russell@bham.ac.uk>

http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

1

2

3