a database of vocal tract resonance trajectories for research in speech processing n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing PowerPoint Presentation
Download Presentation
A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing

Loading in 2 Seconds...

play fullscreen
1 / 17

A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing. L. Deng (Microsoft Research, Redmond) X. Cui & A. Alwan (U. California, Los Angeles) R. Pruvenok (Georgia Institute of Tech, Atlanta) J. Huang (Carnegie Mellon U., Pittsburg) S. Momen (Princeton U., Princeton)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing' - chinue


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a database of vocal tract resonance trajectories for research in speech processing

A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing

L. Deng (Microsoft Research, Redmond)

X. Cui & A. Alwan (U. California, Los Angeles)

R. Pruvenok (Georgia Institute of Tech, Atlanta)

J. Huang (Carnegie Mellon U., Pittsburg)

S. Momen (Princeton U., Princeton)

Y. Chen (Cornell U., Ithaca)

introduction
Introduction
  • Joint research project between MSR & IPAM of UCLA
  • Carried out during 2005 NSF-RIPS summer program
  • Main Goals:
    • Create a database of VTR/formant trajectories for research in speech processing (ground truth).
    • Quantitatively assess various existing automatic VTR/formant tracking algorithms
background
Background
  • Vocal tract resonance (VTR or formant-I) --- acoustic resonance in the human tract in speech production
  • May differ from spectral peaks measured from the speech signal (formant-II)
  • Importance of VTR/formants for speech perception and production
  • Many techniques for automatic VTR or formant-II extraction
background cont d
Background (cont’d)
  • Difficulty of automatic VTR/formant tracking
    • When two formants are close to each other (e.g., /iy,y,uw,r/)
    • Consonant sounds when VTRs are not directly visible from spectrogram (e.g., nasals, fricatives, stops)
    • CV or VC transitions
  • Lack of standard database for quantitative evaluation of tracking algorithms
  • Requirement for extensive human expertise
data selection
Data Selection
  • Subset of TIMIT utterances
  • 538 utterances in total
  • 192 utterances in core test set
  • 346 utterances in training set (173 speakers; one SX & one SI for each)
  • Balance of speaker, dialect, gender, & phoneme distributions
vtr trajectory labeling
VTR Trajectory Labeling
  • Start from the results of a previous VTR tracking algorithm (ICASSP 2004 paper)
  • Develop a software tool for manual error correction using spectrogram display
  • Use human expertise
human expertise
Human Expertise
  • Prior knowledge of nominal VTR target values for individual phones
  • Contextual effects of VTR values (target directed trajectories)
  • Overall spectral properties across entire utterance (same phones at diff times)
  • Effects of anti-resonances in splitting VTRs of nasalized vowels
  • Special formant movement patterns (e.g., velar pinch, etc.)
  • Etc.
two automatic algorithms
Two Automatic Algorithms
  • WaveSurfer http://www.speech.kth.se/wavesurfer) (same algorithm as ESPS/xwaves, Talkin et.al)
    • based on LPC analysis and dynamic programming
  • MSR Hidden dynamic model based algorithm
    • Implemented by Kalman filter/smoother
    • Piecewise-linearized mapping from VTR to cepstra
    • By-product of a speech recognizer
    • Typing all phone VTR targets
    • Details in ICASSP 2004 paper
comparisons of two algorithms
Comparisons of Two Algorithms

His failure to open the store by eight cost him his job

comparisons of two algorithms1
Comparisons of Two Algorithms

We always thought we would die with our boots on

summary and conclusion
Summary and Conclusion
  • VTR/Formants are critical for speech production, perception, and processing
  • Prior to this work, lack of standard database
  • Creating a database using human expertise
  • Immediate application: quantitative evaluation of automatic VTR/formant tracking algorithms
  • Second-pass verification & correction at MSR recently completed
  • Data soon to be publicly released from both MSR and UCLA sites.