1 / 19

Prosody dependent language modeling based on the correlation between prosody and syntax

Prosody dependent language modeling based on the correlation between prosody and syntax. Ken Chen and Mark Hasegawa-Johnson IEEE ASRU 2003 12/03/2003. Y. X. H. Q. P. W. S. M. A Bayesian network view of a speech utterance. X : acoustic-phonetic observ. Y : acoustic-prosodic observ.

kaelem
Download Presentation

Prosody dependent language modeling based on the correlation between prosody and syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prosody dependent language modeling based on the correlation between prosody and syntax Ken Chen and Mark Hasegawa-Johnson IEEE ASRU 2003 12/03/2003

  2. Y X H Q P W S M A Bayesian network view of a speech utterance X: acoustic-phonetic observ. Y: acoustic-prosodic observ. Q: allophone sequence H: phone-level prosody sequence W: word sequence P: prosody sequence S: syntax M: meaning (including all high level information) Frame Level Segmental Level Word Level

  3. X,Y Q,H W,P S M Prosody dependent speech recognition framework • Advantages: • A natural extension of PI-ASR • Allow the convenient integration of useful linguistic knowledge at different levels • Flexible

  4. Prosody modeled in our system • Two types (ToBI labeled): • The Pitch Accent • The Intonational Phrase Boundary (IPB) • They both are highly correlated with acoustics and syntax. • Pitch accents: pitch encursion (H*, L*); encode syntax information (e.g. content/function word distinction). • IPBs: preboundary lengthening, boundary tones, pause, etc.; Highly correlated with syntactic phrase boundaries

  5. Prosody tagged word transcription Prosody independent word transcription: well what is next Prosody dependent word transcription (Obtained by tagging prosody independent word transcriptions using the corresponding ToBI transcriptions): well_af what_um is_um next_af well_af what_am is_um next_af “a/u”: accented/unaccented “m/f”: IP-medial and IP-final

  6. Prosody dependent language model • A prosody dependent language model p(wj,pj|w1,p1 …wj-1,pj-1 ) models the probability of the current prosodic word given its word and prosody history. • The primary reason of building prosody dependent language models: Our experiment has shown that interaction of prosody dependent language model with prosody dependent acoustic model p(O|W,P) is the key to improve word recognition accuracy

  7. The problem of data sparseness in estimating p(W,P) • Modeling prosody tagged word tokens increases the size of the vocabulary by |p| times (|p|: the variety of word level prosody) w_um Prosody-independent LM Prosody-dependent LM w_uf w w_am w_af

  8. Data sparseness problem in estimating p(W,P) Prosody dependent N-gram models are not able to be as robustly estimated from a limited set of data than prosody independent N-gram models using traditional methods. And the number of unseen prosody dependent bigrams increases.

  9. Y X H Q P W S M Factorial prosodic language model • Motivation: Prosody can be predicted from parts-of-speech tags with high accuracy: • 91% for phrasal stress prediction [Arnfield 94] • 84% pitch accent [Hirschberg 93] • 84% pitch accent, 90% IPB on RNC POS can be inferred from word transcriptions with very high accuracy using automatic syntactic parsers • Solution: • Bridge word and prosody using syntax

  10. The algorithm • Conduct syntactic analysis using automatic syntactic parsers (Charniak’s parser, Roth’s parser, etc.) • Estimate syntactic-prosodic models: p(pi|ci,cj,pi), p(pj|ci,cj), p(ci,cj|wi,wj) • Compute prosody dependent N-gram probabilities p(wj,pj|wi,pi) from prosody independent N-gram probabilities p(wj|wi) or p(wj|wi,pi), and the syntactic-prosodic models • Smoothing

  11. Factorial prosodic language model

  12. Factorial prosodic language model • Both POS and prosody contain limited tokens (around 30 for Penn Treebank POS set) • Hence the syntactic prosodic models p(pi|ci,cj,pi), p(pj|ci,cj), p(ci,cj|wi,wj) can be robustly estimated from a small corpus.

  13. PDLM Smoothing • Katz backoff • Linear Iterpolation

  14. The Corpus • The Boston University Radio News Corpus • Stories read 7 professional radio announcers • 5k vocabulary • 25k word tokens • 3 hours clean speech • No disfluency • Expressive and well-behaved prosody • 85% utterances are selected randomly as training, 5% for development-test and the rest 10% for testing. • A small but the largest prosodically transcribed English corpus

  15. Reduction of Perplexity • Joint Perplexity: 2H(W,P) • Word Perplexity: 2H(W)

  16. Prosody dependent speech recognition experiments on RNC • API: prosody independent allophone set (SPHINX monophone models) • 3 state left-right HMM • 3 mixture Gaussian per state • 32 dimensional MFCC_E_D_Z • APD: prosody dependent allophone set (able to detect prosody induced pitch and durational variation) • State Transition Matrix or duration PDFs are dependent on prosody • A one dimensional single Gaussian acoustic-prosody observation PDFs observing non-linear transform pitch features (using ANNs).

  17. Yq Xq h q Xq q Prosody dependent acoustic modeling • Prosody dependent allophone models Λ(q)=>Λ(q,h): • Acoustic-phonetic observation PDF b(X|q) => b(X|q,h) • Duration PDF d(q) => d(q,h) • Acoustic-prosodic observation PDF f(Y|q,h)

  18. Hw Qw p w Qw w Prosody dependent pronunciation modeling p(Qw|w)=> p(Qw|w,p) =>p(Qw,Hw|w,p) • Model the lexical stress: above: ax b! ah! v! • Model phrasal pitch accent and phrase boundaries through prosody-dependent allophonic models: above ax b ah v above! ax b! ah! v! above% ax b ah% v% above!% ax b! ah!% v!%

  19. Prosody dependent speech recognition experiments on RNC • Word and prosody recognition The Approach proposed in this paper improves WRA by 1%. The WRA of PD-ASR improves 2.5% over PI-ASR that has comparable acoustic model parameter count.

More Related