1 / 11

SPECTRUM?

SPECTRUM?. Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,. Radio Rex (1917). Newton. l/2. beer. Helmholtz. /u/ /o/ /a/ / e / /iy/. “limited commercial success” -John Pierce 1969. Short-term spectrum. about 20 ms. classify. frequency. time.

cazevedo
Download Presentation

SPECTRUM?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

  2. Radio Rex (1917) Newton l/2 beer Helmholtz /u/ /o/ /a/ /e/ /iy/ • “limited commercial success” • -John Pierce 1969

  3. Short-term spectrum about 20 ms classify frequency time SHORT TERM SPECTRUM

  4. Cortical receptive fields

  5. temporal pattern of critical band energies classify window Short-term spectrum 1 sec about 20 ms classify frequency time Phone “boundaries” ASR from TempoRAl Patterns (TRAP)

  6. WHY 200-1000 ms ? 200 – 1000 ms frequency time • because that’s where the information is (coarticulation) • mutual info studies (Bilmes, Yang et al.) • psychophysics of hearing • 200 ms “critical time window” (forward masking, perception of loudness, perception of gaps,… • physiology of hearing • time component of cortical receptive fields (Klein) • because “it works” • ETSI Aurora work

  7. WHY narrow frequency bands? frequency time 1-3 Bark • psychophysics of hearing • independence of processing within critical bands • physiology of hearing • mechanical selectivity of cochlea • cortical receptive fields (e.g. Shamma) • because “it works” • multi-band ASR (Bourlard and Dupont, Hermansky et al,…) • decrease in ASR accuracy for wider frequency spans (Jain and Hermansky - Eurospeech 2003)

  8. Which features? frequency time data-guided processing • no knowledge is better than wrong knowledge • data cannot lie • speech evolved to be heard • data-derived processing is consistent with human-like processing (minus the irrelevant components of the human cognitive processing) features

  9. WHY data-guided processing? frequency time data-guided (trained on data) processing • some function of class posteriors • class posteriors form the most efficient feature set [e.g. Fukunaga] • posteriors of which classes? features

  10. event detection frequency selective hearing signal event detection p(event,frequency) Speech Events class (phoneme?) detection

  11. class posteriors processing ( trained system ) data TRAP TANDEM frequency some function of phoneme posteriors processing ( trained system ) processing ( trained system ) data time

More Related