Fast Introduction to CUDA: Basics, Performance, and Integration

CUDA Overview: A Fast Introduction CUDA Overview João Gabriel Felipe Machado Gazolla Advisor: Dr. Esteban Clua

Topics CUDA Overview: A Fast Introduction • What is Cuda? • Where to Download? • How to Install • Architecture • Performance • Visual Studio Integration • Examples • How to Learn more aboutCUDA? GPUs • StudyPlan • References • Discussion

Goal CUDA Overview: A Fast Introduction “...ExplainTheBasics Of CUDA...”

What is CUDA? CUDA Overview: A Fast Introduction Compute UnifiedDeviceArchitecture CUDA is the computing engine in NVIDIA graphics processing units or GPUs, that is accessible to software developers through industry standard programming languages

CUDA Performance CUDA Overview: A Fast Introduction

CPU Scenario CUDA Overview: A Fast Introduction • Specific Code Ex: Population  1024 Soldiers soldierScore(x)  Fitness Function 12387 Unit Points Soldier[i] soldierScore(soldier[i]) Soldier[0...1023]  (1024/1) *time(soldierScore())

GPU Scenario CUDA Overview: A Fast Introduction • Specific Code Ex: Population  1024 Soldiers soldierScore(x)  Fitness Function GeForce XXXX++256 processors 12387 ... 12494 ... 15912 Unit Points Soldier[i] ... Soldier[i+n] soldierScore(soldier[i]) Soldier[0...1023]  (1024/256) *time(soldierScore())

What do I need to run CUDA? CUDA Overview: A Fast Introduction

Where to Download CUDA ? CUDA Overview: A Fast Introduction

What to Download ? CUDA Overview: A Fast Introduction

Does it Worth? CUDA Overview: A Fast Introduction 5% Faster? 20% Faster? 300% Faster? 900% Faster?

UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction • Low Cost, Supercomputing for the Masses

Does it Worth? Speedups CUDA Overview: A Fast Introduction 1 Year 3 Days 1 Day  15 Minutes 2 Minutes  1.2 Seconds 100x

UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction

UnifiedArchitecture - CUDA CUDA Overview: A Fast Introduction • Low Cost, Supercomputing for the Masses

Example: CrowdSimulation CUDA Overview: A Fast Introduction 1.000.000 Bodies

Architecture CUDA Overview: A Fast Introduction • CPUs vs GPUs

GPU – TheEvolution CUDA Overview: A Fast Introduction FixedFunctionGPUs ProgrammableGPUs UnifiedArchitecture

GPU – TheEvolution CUDA Overview: A Fast Introduction FixedFunctionGPUs • NotProgrammableArchitecture • No Acess to theProcessor • OnlyAPIs

GPU – TheEvolution CUDA Overview: A Fast Introduction ProgrammableGPUs • ArchitectureOriented to ComputerGraphics

Unified Architecture - CUDA CUDA Overview: A Fast Introduction

Getting VS2008 for Free CUDA Overview: A Fast Introduction

VS2008 Integration CUDA Overview: A Fast Introduction Install VS2008

VS2008 Integration CUDA Overview: A Fast Introduction

VS2008 Integration CUDA Overview: A Fast Introduction • Command line: • $(CUDA_BIN_PATH)\nvcc.exe -ccbin "$(VCInstallDir)bin" -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Od,/Zi,/RTC1,/MDd -I"$(CUDA_INC_PATH)" -I./ -o $(ConfigurationName)\kernel.obj kernel.cu • Outputs: • $(ConfigurationName)\kernel.obj

VS2008 Integration CUDA Overview: A Fast Introduction

CUDA VS Wizard CUDA Overview: A Fast Introduction

CUDA and Linux TurnoffCompiz Downgrade G++ and GCC From 4.3 to 4.1

CUDA and Linux CUDA Overview: A Fast Introduction

CUDA and Eclipse CUDA Overview: A Fast Introduction

Software Architecture CUDA Overview: A Fast Introduction

CUDA and Threads CUDA Overview: A Fast Introduction Why Programming in Threads? LoadBalancing SharetheLoadAmongProcessors Maximum use ofeachProcessor

CUDA and Threads CUDA Overview: A Fast Introduction Howmany threads haveyou Evercreated? CUDA Allowthousandsand Thousandsof threads = Cluster of Threads

Threads – Management Costs CUDA Overview: A Fast Introduction CPU Few Threads GPU IfweNeed 1000 inst. to change Threads, it’s ok. Thounsads Threads 1000 instIt’s NOT ok.

Cuda - Synchronization CUDA Overview: A Fast Introduction MustbeExplicit “…synchronization is accomplished using the function syncthreads, which acts as a barrier or memory fence…”

Cuda – ImportantDefinitions CUDA Overview: A Fast Introduction Cuda extends the C Language through the kernels *.cu – CUDA Files Each Kernel is a function that will be executed N times on the device

Conventions CUDA Overview: A Fast Introduction Host Device

Functions in CUDA Executed Called Combinations are also Possible No recursionatthedevice (GPU) No staticvariables cudaMalloc() cudaFree()

CUDA andLimitsof Bandwidth of Memory CUDA Overview: A Fast Introduction Reuse your Data!

Architecture CUDA Overview: A Fast Introduction • Hide Implementation Details • HW Evolution

Threads, BlocksandGrids OneKernel OneGrid EachBlock Many Threads All Threads inside a blocksharethesame memory area Threads in differentblocks do notshare memory their local memory amongthem Threads in differentblockscannotcooperate

Threads, BlocksandGrids EachBlock up to 512 threads

Threads, BlocksandGrids CUDA Overview: A Fast Introduction • __ global__ void KernelFunction (...) • dim3 DimGrid (100, 10); // Grid  1000 Blocks • dim3 DimBlock (4, 8, 8); // Each block has 256 threads • Size_t SharedMemBytes = 32 • KernelFun << DimGrid, DimBlock, SharedMemBytes>> (...);

Some code... CUDA Overview: A Fast Introduction // Kernel definition __global__ void vecAdd(float* A, float* B, float* C){...} int main(){ // Kernel invocation vecAdd<<<1, N>>>(A, B, C); } __global  defines that it’s a kernel… CalledonThe Host ExecutedonTheDevice

Some code...

Fast Introduction to CUDA: Basics, Performance, and Integration

Fast Introduction to CUDA: Basics, Performance, and Integration

Presentation Transcript

CUDA Programming,

Cuda

CUDA

CUDA

CUDA Lecture 8 CUDA Memories

CUDA

CUDA Lecture 4 CUDA Programming Basics

CUDA Programming

CUDA

CUDA 5.0

CUDA Optimizations

CUDA

CUDA