1 / 22

The Fast Evaluation of Hidden Markov Models on GPU

The Fast Evaluation of Hidden Markov Models on GPU. Presented by Ting-Yu Mu & Wooyoung Lee. Introduction. Hidden Markov Model (HMM): A statistical method (Probability based) Used in a wide range of applications: Speech recognition Computer vision Medical image analysis

alden
Download Presentation

The Fast Evaluation of Hidden Markov Models on GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Fast Evaluation of Hidden Markov Models on GPU Presented by Ting-Yu Mu & Wooyoung Lee

  2. Introduction • Hidden Markov Model (HMM): • A statistical method (Probability based) • Used in a wide range of applications: • Speech recognition • Computer vision • Medical image analysis • One of the problems need to be solved: • Evaluate the probability of an observation sequence on a given HMM (Evaluation) • The solution of above is the key to choose the best matched models among the HMMs

  3. Introduction – Example • Application: Speech Recognition • Goal: Recognize words one by one • Input: • The speech signal of a given word • Represented as a time sequence of coded spectral vectors • Output: • The observation sequence • Represented as an index indicator in the spectral codebook

  4. Introduction – Example • The tasks: • Design individual HMMs for each word of vocabulary • Perform unknown word recognition: • Using the solution of evaluation problem to score each HMM based on the observation sequence of the word • The model scores the highest is selected as the result • The accuracy: • Based on the correctness of the chosen result

  5. Computational Load • Computational load of previous example consists of two parts: • Estimate the parameters of HMMs to build the models, and the load varies upon each HMM • Executed only one time • Evaluate the probability of an observation sequence on each HMM • Executed many times on recognition process • The performance is depend on the complexity of the evaluation algorithm

  6. Efficient Algorithm • The lower order of complexity: • Forward-backward algorithm • Consists of two passes: • Forward probability • Backward probability • Used extensively • Computational intensive • One way to increase the performance: • Design the parallel algorithm • Utilizing the present day’s multi-core systems

  7. General Purpose GPU • Why choose Graphic Processing Unit • Rapid increases in the performance • Supports floating-points operations • Fast computational power/memory bandwidth GPU is specialized for compute-intensive and highly parallel computation More transistors are devoted to data processing rather that data caching

  8. CUDA Programming Model • The GPU is seen as a compute device to execute part of the application that: • Has to be executed multiple times • Can be isolated as a function • Works independently on different data • Such a function can be compiled to run on the device. The resulting program is called a Kernel • The batch of threads that executes a kernel is organized as a grid of blocks

  9. CUDA Programming Model • Thread Block: • Contains the batch of threads that can be cooperate together: • Fast shared memory • Synchronizable • Thread ID • The block can be one-, two-, or three- dimensional arrays

  10. CUDA Programming Model • Grid of Thread Block: • Contains the limited number of threads in a block • Allows larger numbers of thread to execute the same kernel with one invocation • Blocks identifiable through block ID • Leads to a reduction in thread cooperation • Blocks can be one- or two-dimensional arrays

  11. CUDA Programming Model

  12. CUDA Memory Model

  13. Parallel Algorithm on GPU • The tasks of computing the evaluation probability is split into pieces and delivered to several threads • A thread block evaluates a Markov model • Calculating the dimension of the grid: • Obtained by dividing the number of states N by the block size • Forward probability is computed by a thread within a thread block • Needs to synchronize the threads due to: • Shared data

  14. CUDAfy.NET

  15. What is CUDAfy.Net? • Made by Hybrid DSP Systems in Netherlands • a set of libraries and tools that permit general purpose programming of NVIDIA CUDA GPUs from the Microsoft .NET framework. • combining flexibility, performance and ease of use • First release: March 17, 2011

  16. Cudafy.NET SDK • Cudafy .NET Library • Cudafy Translator (Convert .NET code to CUDA C) • Cudafy Library (CUDA support for .NET) • Cudafy Host (Host GPU wrapper) • Cudafy Math (FFT + BLAS) • The translator converts .NET code into CUDA code. Based on ILSPY (Open Source .NET assembly browser and decompiler)

  17. Cudafy Translator

  18. GENERAL CUDAFY PROCESS • Two main components to the Cudafy SDK: • Translation from .NET to CUDA C and compiling using NVIDIA compiler (this results in a Cudafy module xml file) • Loading Cudafy modules and communicating with GPU from host • It is not necessary for the target machine to perform the first step above. • 1. Add reference to Cudafy.NET.dll from your .NET project • 2. Add the Cudafy, Cudafy.Host and Cudafy.Translator namespaces to source files (using in C#) • 3. Add a parameter of GThread type to GPU functions and use it to access thread, block and grid information as well as specialist synchronization and local shared memory features. • 4. Place a Cudafy attribute on the functions. • 5. In your host code before using the GPU functions call Cudafy.Translator.Cudafy( ). This returns a CudafyModule instance. • 6. Load the module into a GPGPU instance. The GPGPU type allows you to interact seamlessly with the GPU from your .NET code.

  19. Development Requirement • NVIDIA CUDA Toolkit 4.1 • Visual Studio 2010 • Microsoft VC++ Compiler (used by NVIDIA CUDA Compiler) • Windows( XP SP3, VISTA, 7 32bit/64bit) • NVIDIA GPUs • NVIDIA Graphic Driver

  20. GPU vs PLINQ vs LINQ

  21. GPU vs PLINQ vs LINQ

  22. Reference • ILSpy : http://wiki.sharpdevelop.net/ilspy.ashx • Cudafy.NET : http://cudafy.codeplex.com/ • Using Cudafy for GPGPU Programming in .NET : http://www.codeproject.com/Articles/202792/Using-Cudafy-for-GPGPU-Programming-in-NET • Base64 Encoding on a GPU: http://www.codeproject.com/Articles/276993/Base64-Encoding-on-a-GPU • High Performance Queries: GPU vs LINQ vs PLINQ : http://www.codeproject.com/Articles/289551/High-Performance-Queries-GPU-vs-PLINQ-vs-LINQ

More Related