1 / 32

CUDA ( C ompute U nified D evice A rchitecture)

By: Matt Sirabella Neil Weber Christian Casseus Jordan Dubique. CUDA ( C ompute U nified D evice A rchitecture). PARALLEL PROGRAMMING USING THE GPU. CUDA ( C ompute U nified D evice A rchitecture). History of Parallel Computing Computing using the GPU What is CUDA?

sarila
Download Presentation

CUDA ( C ompute U nified D evice A rchitecture)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. By: • Matt Sirabella • Neil Weber • Christian Casseus • Jordan Dubique CUDA(Compute Unified Device Architecture) PARALLEL PROGRAMMING USING THE GPU

  2. CUDA(Compute Unified Device Architecture) History of Parallel Computing Computing using the GPU What is CUDA? Key Features Purpose of CUDA EXAMPLE(S)

  3. CUDA(Compute Unified Device Architecture) History of Parallel Computing

  4. History of GPU Computing • Parallel Programming • 1980’s, early 90’s: golden age of data parallel computing, where the same computations are performed on different data elements • Super Computers • Powerful, but expensive • Despite its lack of availability, super computers created excitement about parallel computing.

  5. History of GPU Computing • Parallel Programming • The complexity of parallel computing is much higher than sequential computing. This is where CUDA comes in!

  6. GPU Computing • Why use GPU’s in computing? • GPU’s are massively multithreaded many-core chips. • Many-Core Chips (GPU) vs. Multi-Core Chips (CPU) • Many-core chips contain hundreds of processor cores. • Multi-core chips contain less cores (Ex. Dual Core, Quad Core, Eight Core). • Increase Application Efficiency

  7. GPU Computing • Why GPU Computing? • GPU’s have the ability to run tens of thousands of threads concurrently. GPU Threads Threads CPU + Cache Cache Control Threads

  8. Vector (Example) • Add Vector A to Vector B, • store result in Vector C A B C C = A + B + = Vector Size: n

  9. Vector (Example) • Add Vector A to Vector B, • store result in Vector C A B C Sequential Execution: for (i = 0…n-1) { C[i] = A[i] + B[i] } + = Vector Size: n

  10. Vector (Example) • Add Vector A to Vector B, • store result in Vector C A B C Sequential Execution: for (i = 0…n-1) { C[i] = A[i] + B[i]; } + = In CUDA: VecAdd(A, B, C) { inti = threadIndex; C[i] = A[i] + B[i]; } VecAdd<<<n>>>(A, B, C) Vector Size: n

  11. What is CUDA? • CUDA is a parallel computing platform and programming model invented by NVIDIA. • NVIDIA GPUs implement this architecture and programming model. • CUDA works with all NVIDIA GPUs from the G8x series onwards • By downloading the CUDA Toolkit you can code algorithms for execution on the GPU.

  12. History • CUDA project was announced in November, 2006 • Public beta version of CUDA SDK was released in February, 2007 as the world's first solution for general-computing on GPUs. • Later in the year came CUDA 1.1 beta, which introduced CUDA functions to common NVIDIA drivers. • Current Version: CUDA 5.0 • developer.nvidia.com/cuda-downloads

  13. CUDA • The developer still programs in the familiar C, C++, Fortran, or another supported language, and incorporates extensions of these languages in the form of a few basic keywords. • "GPUs have evolved to the point where many real-world applications are easily implemented on them and run significantly faster than on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs.“ -- Jack DongarraProfessor, University of Tennessee

  14. See the Difference Most people confuse CUDA for a language or maybe an API. It is not.

  15. Where to Learn • developer.nvidia.com/cuda-education-training • NVIDIA hosts regular webinars for developers • “The key thing customers said was they didn't want to have to learn a whole new language or API. Some of them were hiring gaming developers because they knew GPUs were fast but didn't know how to get to them. Providing a solution that was easy, that you could learn in one session and see it outperform your CPU code was critical.“ -- Ian BuckGeneral Manager, NVIDIA

  16. CUDA in Action

  17. CUDA What sets CUDAapart

  18. Accessible in many ways • The CUDA platform is accessible to software developers through CUDA-accelerated libraries and compiler directives • Provides accessibility through extensions to commonly used programming languages. • C and C++ (CUDA C/C++) and Fortran (CUDA Fortran). • CUDA platform supports other computational interfaces. • KhronosGroup's OpenCLand Microsoft's DirectCompute and C++ AMP. • Third party wrappers available for other languages. • Python, Perl, Fortran, Java, Ruby, Lua, Haskell, MATLAB and IDL.

  19. Distinct features • Parallelism. • Data locality. • Thread cooperation.

  20. Parallelism • Parallel throughput architecture that emphasizes executing many concurrent threads slowly, rather than executing a single thread very quickly. • Facilitate heterogeneous computing: CPU + GPU. • Parallel portions of an application are executed on the device as kernels.

  21. Data locality • CUDA model encourages data locality and reuse for good performance on the GPU. • The data tiling and locality expressed in effective CUDA kernels also gains most of the benefits of hand-optimization for the CPU architecture. • The expression of data locality and computational regularity in the CUDA programming model achieves much of the performance benefits of tuning code for the architecture by hand.

  22. Thread cooperation • CUDA threads are extremely lightweight. • Very little creation overhead. • Fast switching. • CUDA uses thousands of threads to achieve efficiency. • Multi-core CPUs can only use a few. • Thread cooperation is valuable. • Cooperate on memory accesses. • Share results to avoid redundant computation.

  23. CUDA Syntax

  24. CUDA C/C++ Compiler

  25. CUDA Kernels • __global__ void testkernel(void) { }, CUDA Kernel declaration. • __global__ Keyword identifies a function that will run on the device. • “testkernel<<<Blocks,Threads>>>(); “ syntax for calling a function • The parameters inside the angled brackets detail the number of blocks followed by number of threads that concurrently execute the function. • testkernel<<<X,Y>>>(); Indicates Y Threads per Block • Streams can also be specified to increase concurrency “testkernel<<<1,1,0,stream1>>>(); ”

  26. Block/Thread Structure

  27. Memory • Core memory Functions in CUDA are cudaMalloc(), cudaFree(), cudaMemcpy() *cudaMemcpyAsync() increases concurrency • Threads within a Block can share memory using the keyword “__shared__ “ • void __syncthreads(); Ensures that all threads within a block have access to the same data.

  28. Other Keywords and Variables • gridDimand blockDim: Contain dimensions for grids and blocks • blockIdx: contains block index within a grid. For example, blockIdx.x • threadIdx: contains thread index within a block. • __device__: Declares a variable that resides in the devices global memory • __constant__: Declares a variable that resides in devices constant memory

  29. Memory Structure

  30. Code Examples.

  31. Code Examples

  32. Larger Applications • Sequence Analysis and Alignment • Database Searching and Indexing • Next-generation Sequencing and its Applications • Phylogeny Reconstruction • Computational Genomics and Proteomics • Gene expression, Microarrays and Gene Regulatory Networks • Protein Structure Prediction • Production-level GPU Parallelization of widely used algorithms and tools. • Bioinformatics Research • GPUGRID.com

More Related