1 / 65

CUDA Overview

CUDA Overview: A Fast Introduction. CUDA Overview. João Gabriel Felipe Machado Gazolla Advisor: Dr. Esteban Clua. Topics. CUDA Overview: A Fast Introduction. What is Cuda ?. Where to Download?. How to Install. Architecture. Performance. Visual Studio Integration. Examples.

robertot
Download Presentation

CUDA Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CUDA Overview: A Fast Introduction CUDA Overview João Gabriel Felipe Machado Gazolla Advisor: Dr. Esteban Clua

  2. Topics CUDA Overview: A Fast Introduction • What is Cuda? • Where to Download? • How to Install • Architecture • Performance • Visual Studio Integration • Examples • How to Learn more aboutCUDA? GPUs • StudyPlan • References • Discussion

  3. Goal CUDA Overview: A Fast Introduction “...ExplainTheBasics Of CUDA...”

  4. What is CUDA? CUDA Overview: A Fast Introduction Compute UnifiedDeviceArchitecture CUDA is the computing engine in NVIDIA graphics processing units or GPUs, that is accessible to software developers through industry standard programming languages

  5. CUDA Performance CUDA Overview: A Fast Introduction

  6. CPU Scenario CUDA Overview: A Fast Introduction • Specific Code Ex: Population  1024 Soldiers soldierScore(x)  Fitness Function 12387 Unit Points Soldier[i] soldierScore(soldier[i]) Soldier[0...1023]  (1024/1) *time(soldierScore())

  7. GPU Scenario CUDA Overview: A Fast Introduction • Specific Code Ex: Population  1024 Soldiers soldierScore(x)  Fitness Function GeForce XXXX++256 processors 12387 ... 12494 ... 15912 Unit Points Soldier[i] ... Soldier[i+n] soldierScore(soldier[i]) Soldier[0...1023]  (1024/256) *time(soldierScore())

  8. What do I need to run CUDA? CUDA Overview: A Fast Introduction

  9. Where to Download CUDA ? CUDA Overview: A Fast Introduction

  10. What to Download ? CUDA Overview: A Fast Introduction

  11. Does it Worth? CUDA Overview: A Fast Introduction 5% Faster? 20% Faster? 300% Faster? 900% Faster?

  12. UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction • Low Cost, Supercomputing for the Masses

  13. Does it Worth? Speedups CUDA Overview: A Fast Introduction 1 Year 3 Days 1 Day  15 Minutes 2 Minutes  1.2 Seconds 100x

  14. UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction

  15. UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction • Low Cost, Supercomputing for the Masses

  16. Example: CrowdSimulation CUDA Overview: A Fast Introduction 1.000.000 Bodies

  17. Architecture CUDA Overview: A Fast Introduction • CPUs vs GPUs

  18. GPU – TheEvolution CUDA Overview: A Fast Introduction FixedFunctionGPUs ProgrammableGPUs UnifiedArchitecture

  19. GPU – TheEvolution CUDA Overview: A Fast Introduction FixedFunctionGPUs • NotProgrammableArchitecture • No Acess to theProcessor • OnlyAPIs

  20. GPU – TheEvolution CUDA Overview: A Fast Introduction ProgrammableGPUs • ArchitectureOriented to ComputerGraphics

  21. Unified Architecture - CUDA CUDA Overview: A Fast Introduction

  22. Getting VS2008 for Free CUDA Overview: A Fast Introduction

  23. VS2008 Integration CUDA Overview: A Fast Introduction Install VS2008

  24. VS2008 Integration CUDA Overview: A Fast Introduction

  25. VS2008 Integration CUDA Overview: A Fast Introduction

  26. VS2008 Integration CUDA Overview: A Fast Introduction

  27. VS2008 Integration CUDA Overview: A Fast Introduction

  28. VS2008 Integration CUDA Overview: A Fast Introduction

  29. VS2008 Integration CUDA Overview: A Fast Introduction

  30. VS2008 Integration CUDA Overview: A Fast Introduction • Command line: • $(CUDA_BIN_PATH)\nvcc.exe -ccbin "$(VCInstallDir)bin" -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Od,/Zi,/RTC1,/MDd -I"$(CUDA_INC_PATH)" -I./ -o $(ConfigurationName)\kernel.obj kernel.cu • Outputs: • $(ConfigurationName)\kernel.obj

  31. VS2008 Integration CUDA Overview: A Fast Introduction

  32. CUDA VS Wizard CUDA Overview: A Fast Introduction

  33. CUDA and Linux TurnoffCompiz Downgrade G++ and GCC From 4.3 to 4.1

  34. CUDA and Linux CUDA Overview: A Fast Introduction

  35. CUDA and Eclipse CUDA Overview: A Fast Introduction

  36. Software Architecture CUDA Overview: A Fast Introduction

  37. CUDA and Threads CUDA Overview: A Fast Introduction Why Programming in Threads? LoadBalancing SharetheLoadAmongProcessors Maximum use ofeachProcessor

  38. CUDA and Threads CUDA Overview: A Fast Introduction Howmany threads haveyou Evercreated? CUDA Allowthousandsand Thousandsof threads = Cluster of Threads

  39. Threads – Management Costs CUDA Overview: A Fast Introduction CPU Few Threads GPU IfweNeed 1000 inst. to change Threads, it’s ok. Thounsads Threads 1000 instIt’s NOT ok.

  40. Cuda - Synchronization CUDA Overview: A Fast Introduction MustbeExplicit “…synchronization is accomplished using the function syncthreads, which acts as a barrier or memory fence…”

  41. Cuda – ImportantDefinitions CUDA Overview: A Fast Introduction Cuda extends the C Language through the kernels *.cu – CUDA Files Each Kernel is a function that will be executed N times on the device

  42. Conventions CUDA Overview: A Fast Introduction Host Device

  43. Functions in CUDA Executed Called Combinations are also Possible No recursionatthedevice (GPU) No staticvariables cudaMalloc() cudaFree()

  44. CUDA andLimitsof Bandwidth of Memory CUDA Overview: A Fast Introduction Reuse your Data!

  45. Architecture CUDA Overview: A Fast Introduction • Hide Implementation Details • HW Evolution

  46. Threads, BlocksandGrids OneKernel OneGrid EachBlock Many Threads All Threads inside a blocksharethesame memory area Threads in differentblocks do notshare memory their local memory amongthem Threads in differentblockscannotcooperate

  47. Threads, BlocksandGrids EachBlock up to 512 threads

  48. Threads, BlocksandGrids CUDA Overview: A Fast Introduction • __ global__ void KernelFunction (...) • dim3 DimGrid (100, 10); // Grid  1000 Blocks • dim3 DimBlock (4, 8, 8); // Each block has 256 threads • Size_t SharedMemBytes = 32 • KernelFun << DimGrid, DimBlock, SharedMemBytes>> (...);

  49. Some code... CUDA Overview: A Fast Introduction // Kernel definition __global__ void vecAdd(float* A, float* B, float* C){...} int main(){ // Kernel invocation vecAdd<<<1, N>>>(A, B, C); } __global  defines that it’s a kernel… CalledonThe Host ExecutedonTheDevice

  50. Some code...

More Related