HUCAA 2014. gPU -ACCELERATED hmm FOR Speech Recognition. Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University. Outline. Background & Motivation HMM GPGPU Results Future Work. Background. Translate Speech to Text Speaker Dependent Speaker Independent
Leiming Yu, YashUkidave and David KaeliECE, Northeastern University
Dynamic Time WarpingA template-based approach to measure similarity between two temporal sequences which may vary in time or speed.
Dynamic Time Warping
1) Handle timing variation
2) Recognize Speech at reasonable cost
1) Template Choosing
2) Ending point detection (VAD, acoustic noise)
3) Words with weak fricatives, close to acoustic background
For i := 1 to n
For j := 1 to m
cost:= D(s[i], t[j])
DTW[i, j] := cost + minimum(DTW[i-1, j ],
Algorithms mimics the brain.
Simplified Interpretation: * takes a set of input features * goes through a set of hidden layers * produces the posterior probabilities as the output
“activation” of unit in layer
matrix of weights controlling function mapping from layer to layer
[Machine Learning, Coursera]
Hint: * effective in recognizing individual phones
isolated words as short-time units
* not ideal for continuous recognition tasks largely due to the poor ability to model temporal dependencies.
In a Hidden Markov Model, * the states are hidden * output that depend on the states are visible
x — states
y — possible observations
a — state transition probabilities
b — output probabilities
The temporal transition of the hidden states fits well with the nature of phoneme transition.
Hint: * Handle temporal variability of speech well
* Gaussian mixture models(GMMs), controlled by the hidden variables determine how well a HMM can represent the acoustic input. * Hybrid with NN to leverage each modeling technique
Markov chains and processes are named after Andrey Andreyevich Markov(1856-1922), a Russian mathematician, whose Doctoral Advisor is PafnutyChebyshev.
1966, Leonard Baum described the underlying mathematical theory.
1989, Lawrence Rabiner wrote a paper with the most comprehensive description on it.
* causal transitional probabilities between states
* observation depends on current state, not predecessor
t - 1
t + 1
t + 2
* Initial Probability
* Transition Prob. Observation Prob.
* Forward Variable Backward Variable
Other Variables During Estimation:
* theestimated state transition probability matrix, epsilon
* the estimated probability in a particular state at time t, gamma
* Multivariate Normal Probability Density Function
Update Obs. Prob. From Gaussian Mixture Models
GPU Hierarchical Memory System
GPU-powered Eco System
1) Programming Model
* OpenACC, etc.
2) High Performance Libraries
* MAGMA (CUDA/OpenCL/Intel Xeon Phi)
* Armadilo (C++ Linear Algebra Library), drop-in libraries etc.
3) Tuning/Profiling Tools
* Nvidia: nvprof / nvvp * AMD: CodeXL
4) Consortium Standards
Heterogeneous System Architecture (HSA) Foundation
Mitigate Data Transfer Latency
Pinned Memory Size current process limit: ulimit -l ( in KB )
hardware limit: ulimit –H –l
increase the limit: ulimit –S –l 16384
A Practice to Efficiently Utilize Memory System
Running Multiple Word