COMP 4060 Natural Language Processing. Speech Processing. Speech and Language Processing. Spoken Language Processing from speech to text to syntax and semantics to speech Speech Recognition human speech recognition and production acoustics signal analysis phonetics
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
COMP 4060 Natural Language Processing
e.g. Fast-Fourier Transform FFT Spectrum
e.g. Neural Networks, Hidden-Markov Models
Acoustic / sound wave
Filtering, Sampling Spectral Analysis; FFT
Features (Phonemes; Context)
Signal Processing / Analysis
HMM, Neural Networks
Grammar or Statistics
Phoneme Sequences / Words
Grammar or Statistics for
likely word sequences
Word Sequence / Sentence
"She just had a baby."
From Signal Representation derive, e.g.
strong frequency components; characterize particular vowels; gender of speaker
baseline for higher frequency harmonics like formants; gender characteristic
characteristic for e.g. plosives (form of articulation)
Speech ProductionPhonetic Features
Video of glottis and speech signal in lingWAVES (from http://www.lingcom.de)
Recognition Process based on
The Viterbi Algorithm finds an optimal sequence of states in continuous Speech Recognition, given an observation sequence of phones and a probabilistic (weighted) FA (state graph). The algorithm returns the path through the automaton which has maximum probability and accepts the observation sequence.
a[s,s'] is the transition probability (in the phonetic word model) from current state s to next state s', and b[s',ot] is the observation likelihood of s' given ot. b[s',ot] is 1 if the observation symbol matches the state, and 0 otherwise.
function VITERBI (observations of len T, state-graph) returns best-path
num-states NUM-OF-STATES (state-graph)
Create a path probability matrix viterbi[num-states+2,T+2]
for each time step tfrom 0 to Tdo
for each state sfrom 0 to num-statesdo
for each transition s' from s in state-graph
new-score viterbi[s,t] * a[s,s'] * b[s',(ot)]
if ((viterbi[s',t+1] = 0) || (new-score > viterbi[s',t+1]))
Backtrace from highest probability state in the final column of viterbiand return path
single word vs. continuous speech
unlimited vs. large vs. small vocabulary
speaker-dependent vs. speaker-independent
training (or not)
Speech Recognition vs. Speaker Identification
Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001
Figures taken from:
Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000, Chapters 5 and 7.
NL and Speech Resources and Tools:
German Demonstration Center for Speech and Language Technologies:http://www.lt-demo.org/