1 / 20

Acoustic / Lexical Model

Acoustic / Lexical Model. Derk Geene. Speech recognition. P(words|signal)= P(signal|words) P(words) / P(signal) P(signal|words): Acoustic model P(words): Language model Idea: Maximize P(signal|words) P(words) Today: Acoustic model. Variability. Variation Speaker Pronunciation

rafael
Download Presentation

Acoustic / Lexical Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acoustic / Lexical Model Derk Geene

  2. Speech recognition • P(words|signal)= P(signal|words) P(words) / P(signal) • P(signal|words): Acoustic model • P(words): Language model • Idea: Maximize P(signal|words) P(words) • Today: Acoustic model

  3. Variability • Variation • Speaker • Pronunciation • Environmental • Context • Static acoustic model will not work in real applications. • Dynamically adapt P(signal|words) while using the system.

  4. Measuring errors (1) • 500 sentences of 6 – 10 words each from 5 to 10 different speakers. • 10% relative error reduction • Training set / Development set • First decide optimal parameter settings.

  5. Word recognition errors: Substitution Deletion Insertion Correct: Did mob mission area of the Copeland ever go to m4 in nineteen eighty one? Recognized: Did mob mission area ** the copyland ever go to m4 in nineteen east one? Measuring errors (2)

  6. Correct: The effect is clear Recognised: Effect is not clear Error Rate One by one: 75% Word error rate Subs + Dels + Ins Word error rate=100% x #words in correct sentence Measuring errors (3)

  7. Units of speech (1) • Modeling is language dependent.fixme • Modeling unit • Accurate • Trainable • Generalizable

  8. Units of speech (2) • Whole-word models • Only suitable for small vocabulary recognition • Phone models • Suitable for large vocabulary recognition • Problem: over-generalize  less accurate • Syllable models

  9. Context dependency (1) • Recognition accuricy can be improved by using context-dependent parameters. • Important in fast / spontanious speech. • Example: the phoneme /ee/

  10. Peat • Wheel

  11. Triphone model: phonetic model that takes into consideration both the left and the right neightbouring phones. If two phones have the same identity, but different left or right contexts, there are considered different triphones. Interword context-dependent phones. Place in the word: Beginning Middle End Context dependency (2)

  12. Context dependency (3) • Stress • Longer duration • Higher pitch • More intensity • Word-level stress • Import – Import • Italy – Italian • Sentence-level stress • I did have dinner. • I did have dinner.

  13. Radio • Radio

  14. Context dependency (4) • Vary much triphones. 503 = 125.000 • Many phonemes have the same effects /b/ & /p/ labial (pronounces by using lips) /r/ & /w/ liquids • Clustered acoustic-phonetic units Is the left-context phone a fricative? Is the right-context phone a front vowel?

  15. Acoustic model • After feature extraction, we have a sequence of feature vectors, such as the MFCC vector, as input data. Feature stream Segmentation and labeling Phonemes / units Lexical access problem Words

  16. Acoustic model • Signal  Phonemes • Problem: phonemes can be pronounced differently • Speaker differences • Speaker rate • Microphone

  17. Acoustic model • Phonemes  Words • The three major ways to do this: • Vector Quantization • Hidden Markov Models • Neural Networks

  18. Acoustic model • Problem: Multiple pronunciations: • Dialect variation • Coarticulation 0,5 m 0,2 0,5

  19. The End

More Related