This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture #16: Speech Recognition Overview (cont.).
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Lecture #16: Speech Recognition Overview (cont.)
Thanks to Alex Acero (Microsoft Research), Jeff Adams (Nuance), Simon Arnfield (Sheffield), Dan Klein (UC Berkeley), Mazin Rahim (AT&T Research) for many of the materials used in this lecture.
(… or pentaphones: use 2 phonemes before & after)
How do we know how to segment words into phones?
Word LexiconWord Lexicon
Small Vocabulary, Acoustic Phonetics-based
Large Vocabulary; Syntax, Semantics,
Very Large Vocabulary; Semantics, Multimodal Dialog
Medium Vocabulary, Template-based
Large Vocabulary, Statistical-based
Isolated Words Connected Digits Continuous Speech
Continuous Speech Speech Understanding
Spoken dialog; Multiple modalities
Connected Words Continuous Speech
Stochastic language understanding Finite-state machines Statistical learning
Pattern recognition LPC analysis Clustering algorithms Level building
Filter-bank analysis Time-normalization Dynamicprogramming
Concatenative synthesis Machine learning Mixed-initiative dialog
Hidden Markov models Stochastic Language modeling
1962 1967 1972 1977 1982 1987 1992 1997 2003
DOMAIN 78 89
* WERR means relative word error rate reduction on an in-house evaluation set.
Results from Jeff Adams, ca. 2006
ca. 1980 ca. 2004
profits rose to twenty eight million dollars .\period see figure one a\a on page one twenty four .\period
Profits rose to $28 million. See fig. 1a on p. 124.