1 / 14

Speech Recognition

Speech Recognition. - Ajay Iyer. Outline. What is a Spectrogram? Types of Spectrogram Linguistic and Acoustic Category Prosodic Analysis Pitch Estimation. What is a Spectrogram?. A Spectrogram is a visual representation of an acoustic signal.

fedora
Download Presentation

Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Recognition - Ajay Iyer

  2. Outline • What is a Spectrogram? • Types of Spectrogram • Linguistic and Acoustic Category • Prosodic Analysis • Pitch Estimation

  3. What is a Spectrogram? • A Spectrogram is a visual representation of an acoustic signal. • It displays the degrees of amplitude, frequency and temporal content of the signal. • Depending on the size of the Fourier analysis window, different resolutions in frequency/time are achieved. • A long analysis window, resolves frequency at the expense of time thereby giving a “Narrowband spectr0gram”. • A short analysis window on the other hand, resolves time at the expense of frequency – hence called a “Wideband spectrogram”.

  4. Types of Spectrograms Narrowband Spectrogram Wideband Spectrogram

  5. Spectrograms

  6. Linguistic/ Acoustic Categories • Labeling of the Linguistic and/or Acoustic categories aids in speeding up the search and decoding algorithms, by discarding the impossible and highly unlikely phoneme combinations. • Implementation : The given phoneme is compared to the different categories according to TIMIT lexicon. • The category thus obtained is displayed along with the phoneme as shown in the following slide.

  7. Linguistic/Acoustic Categories

  8. Prosodic Analysis • Acoustically speaking, prosodies refer to variation in syllable duration, loudness, pitch and the formant frequencies of the speech signal. • Prosodic features are suprasegmental, i.e they are not restricted to any one segment of speech. They occur in some higher level of an utterance. • Say for example: “No!”, “Don’t!”

  9. Pitch • Of the various prosodic features, the most important one is the pitch. • Its knowledge enables one to differentiate between contexts in which a word is spoken viz. Alerting or Referential contexts. • Thus incorporation of pitch information increases the accuracy of the recognizer.

  10. Implementation • The pitch.m file uses cepstral analysis to extract pitch information. • Pitch.m performs analysis on one analysis frame segment. • Frame based analysis has been coded for pitch estimation of the entire speech signal. • The estimated fundamental frequency (pitch) is for the instance of time tpitch= tinterval(frameNum - 1) + fo/Fs;

  11. Pitch Estimation

  12. Pitch Estimation

  13. References • Prosodic_Modeling_for_Improved_Speech_Recogntion_and_Understanding_Wang_phd_thesis.pdf • Prosodic Analysis of Alerting and Referential Context of Sentinel Words_final_draft.pdf • Discrimination_of_Sentinel_Word_Contexts_using_Prosodic_Features_Journal_v1.pdf • http://home.cc.umanitoba.ca/~robh/howto.html • http://en.wikipedia.org/wiki/Prosody_(linguistics)

  14. Thank You

More Related