- 72 Views
- Uploaded on
- Presentation posted in: General

gPU -ACCELERATED hmm FOR Speech Recognition

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

HUCAA 2014

gPU-ACCELERATED hmm FOR Speech Recognition

Leiming Yu, YashUkidave and David KaeliECE, Northeastern University

- Background & Motivation
- HMM
- GPGPU
- Results
- Future Work

- Translate Speech to Text
- Speaker DependentSpeaker Independent
- Applications* Natural Language Processing* Home Automation* In-car Voice Control* Speaker Verifications* Automated Banking* Personal Intelligent AssistantsApple SiriSamsung S Voice* etc.

[http://www.kecl.ntt.co.jp]

Dynamic Time WarpingA template-based approach to measure similarity between two temporal sequences which may vary in time or speed.

[opticalengineering.spiedigitallibrary.org]

Dynamic Time Warping

DTW Pros:

1) Handle timing variation

2) Recognize Speech at reasonable cost

DTW Cons:

1) Template Choosing

2) Ending point detection (VAD, acoustic noise)

3) Words with weak fricatives, close to acoustic background

For i := 1 to n

For j := 1 to m

cost:= D(s[i], t[j])

DTW[i, j] := cost + minimum(DTW[i-1, j ],

DTW[i, j-1],

DTW[i-1, j-1])

Algorithms mimics the brain.

Simplified Interpretation:* takes a set of input features* goes through a set of hidden layers* produces the posterior probabilities as the output

Parking Meter

Bike

Pedestrian

Car

If Pedestrian

“activation” of unit in layer

matrix of weights controlling function mapping from layer to layer

[Machine Learning, Coursera]

Equation Example

Hint: * effective in recognizing individual phones

isolated words as short-time units

* not ideal for continuous recognition tasks largely due to the poor ability to model temporal dependencies.

In a Hidden Markov Model, * the states are hidden* output that depend on the states are visible

x — states

y — possible observations

a — state transition probabilities

b — output probabilities

[wikipedia]

The temporal transition of the hidden states fits well with the nature of phoneme transition.

Hint: * Handle temporal variability of speech well

* Gaussian mixture models(GMMs), controlled by the hidden variables determine how well a HMM can represent the acoustic input. * Hybrid with NN to leverage each modeling technique

- Parallel Architecturemulti-core CPU to many-core GPU ( graphics + general purpose)
- Massive Parallelism in Speech Recognition SystemNeural Networks, HMMs, etc. , are both Computation and Memory Intensive
- GPGPU Evolvement* Dynamic Parallelism
- * Concurrent Kernel Execution* Hyper-Q
- * Device Partitioning* Virtual Memory Addressing* GPU-GPU Data Transfer, etc.
- Previous works
- Our goal is to use new modern GPU features to accelerate Speech Recognition

- Background & Motivation
- HMM
- GPGPU
- Results
- Future Work

Markov chains and processes are named after Andrey Andreyevich Markov(1856-1922), a Russian mathematician, whose Doctoral Advisor is PafnutyChebyshev.

1966, Leonard Baum described the underlying mathematical theory.

1989, Lawrence Rabiner wrote a paper with the most comprehensive description on it.

HMM Stages

* causal transitional probabilities between states

* observation depends on current state, not predecessor

- Forward
- Backward
- Expectation-Maximization

- Forward
- Backward
- Expectation-Maximization

I

J

t - 1

t

t + 1

t + 2

Variable Definitions:

* Initial Probability

* Transition Prob.Observation Prob.

* Forward Variable Backward Variable

Other Variables During Estimation:

* theestimated state transition probability matrix, epsilon

* the estimated probability in a particular state at time t, gamma

* Multivariate Normal Probability Density Function

Update Obs. Prob. From Gaussian Mixture Models

- Background & Motivation
- HMM
- GPGPU
- Results
- Future Work

Programming Model

GPU Hierarchical Memory System

- Visibility
- Performance Penalty

[http://www.biomedcentral.com]

- Visibility
- Performance Penalty

[www.math-cs.gordon.edu]

GPU-powered Eco System

1) Programming Model

* CUDA

* OpenCL

* OpenACC, etc.

2) High Performance Libraries

* cuBLAS

* Thrust

* MAGMA (CUDA/OpenCL/Intel Xeon Phi)

* Armadilo (C++ Linear Algebra Library), drop-in libraries etc.

3) Tuning/Profiling Tools

* Nvidia: nvprof / nvvp* AMD: CodeXL

4) Consortium Standards

Heterogeneous System Architecture (HSA) Foundation

- Background& Motivation
- HMM
- GPGPU
- Results
- Future Work

Platform Specs

Mitigate Data Transfer Latency

Pinned Memory Sizecurrent process limit:ulimit -l ( in KB )

hardware limit:ulimit –H –l

increase the limit:ulimit –S –l 16384

A Practice to Efficiently Utilize Memory System

Hyper-Q Feature

Running Multiple Word

Recognition Tasks

- Background& Motivation
- HMM
- GPGPU
- Results
- Future Work

- Integrate with Parallel Feature Extraction
- Power Efficiency Implementation and Analysis
- Embedded System Development, Jetson TK1 etc.
- Improve generosity, LMs
- Improve robustness, Front-end noise cancelation
- Go with the trend!

QUESTIONS ?