speech recognition based on bayesian networks with energy as an auxiliary variable n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE PowerPoint Presentation
Download Presentation
SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE

Loading in 2 Seconds...

play fullscreen
1 / 15

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE. Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain. Contents. Bayesian Networks Automatic Speech Recognition using Dynamic BNs Auxiliary variables

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE' - oakley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speech recognition based on bayesian networks with energy as an auxiliary variable

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE

Jaume Escofet Carmona

IDIAP, Martigny, Switzerland

UPC, Barcelona, Spain

contents
Contents
  • Bayesian Networks
  • Automatic Speech Recognition using Dynamic BNs
  • Auxiliary variables
  • Experiments with energy as an auxiliary variable
  • Conclusions
what is a bayesian network

Joint distribution of V:

P(V) = P(vn|parents(vn))

N

v2

P

n=1

v1

v3

What is a Bayesian Network?
  • A BN is a type of graphical model composed of:
  • A directed acyclic graph (DAG)
  • A set of variables V = {v1,… ,vN}
  • A set of probability density functions P(vn|parents(vn))

Example:

P(V) = P(v1,v2,v3) = P(v1|v2)×P(v2)×P(v3|v2)

automatic speech recognition asr

T

P

t=1

Automatic Speech Recognition (ASR)

M1: ‘cat’

M2: ‘dog’

MK: ‘tiger’

LPC, MFCC,...

HMM, ANN,...

Feature

extraction

Statistical

models

Mj

X = {x1,… ,xT}

Mj = argmax P(Mk|X) = argmax P(X|Mk) × P(Mk)

{Mk}

{Mk}

P(X|Mk) = p(xt|qt) × p(qt|qt-1)

asr with dynamic bayesian networks
ASR with Dynamic Bayesian Networks

phone

qt

/k/

/a/

/a/

/t/

acoustics

xt

t = 1

t = 2

t = 3

t = 4

Equivalent to a standard HMM

asr with dynamic bayesian networks1
ASR with Dynamic Bayesian Networks

P(qt | qt-1 )

qt-1

qt

p(xt|qt=k) ~ Nx(mk,Sk)

xt-1

xt

auxiliary information 1
Auxiliary information (1)
  • Main advantage of BNs:
    • Flexibility in defining dependencies between variables
  • Energy damage the system performance if it is appended to the feature vector
  • BNs allow us to use it in an alternativeway:
    • Conditioning the emission distributions upon this auxiliary variable
    • Marginalizing it out in recognition
auxiliary information 2
Auxiliary information (2)

The value of at affects

the value of xt

qt

at

p(xt | qt=k ,at=z) ~ Nx(mk+Bk×z,Sk)

xt

auxiliary information 3
Auxiliary information (3)

The value of the auxiliary

variable can be influenced

by the hidden state qt

qt

at

p(at | qt=k) ~ Na(mak ,Sak)

xt

p(xt | qt=k,at=z) ~ Nx(mk+Bkz,Sk)

auxiliary information 4
Auxiliary information (4)

Equivalent to appending the

auxiliary variable to the

feature vector

qt

at

p(xt , at |qt=k) ~ Nxa( mkxa, Skxa)

xt

hiding auxiliary information
Hiding auxiliary information
  • We can also marginalize out (hide)
  • the auxiliary variable in recognition
  • Useful when:
      • It is noisy
      • It is not accessible

qt

at

ò

p(xt|qt) = p(xt|qt,at)×p(at|qt)dat

xt

experimental setup
Experimental setup
  • Isolated word recognition
  • Small vocabulary (75 words)
  • Feature extraction: Mel Frequency Cepstral Coefficients (MFCC)
  • p(xt|qt) modeled with 4 mixtures of gaussians
  • p(at|qt) modeled with 1 gaussian
experiments with energy as an auxiliary variable

N

S

E = log s2[n]w2[n]

n=1

Baseline

Experiments with Energy as an auxiliary variable

System 1

WER Observed Energy Hidden Energy

System 1 6.9 % 5.3 %

System 2 6.1 % 5.6 %

System 3 5.8 % 5.9 %

Baseline 5.9 %

System 2

System 3

conclusions
Conclusions
  • BNs are more flexible than HMMs. You can easily:
    • Change the topology of the distributions
    • Hide variables when necessary
  • Energy can improve the system performance if used in a non-traditional way