slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25 , 2011 DeviceRoutines.pptx PowerPoint Presentation
Download Presentation
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25 , 2011 DeviceRoutines.pptx

Loading in 2 Seconds...

play fullscreen
1 / 14

ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25 , 2011 DeviceRoutines.pptx - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Device Routines and device variables. These notes will introduce : Declaring routines that are be executed on device and on the host Declaring local variable on device. ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25 , 2011 DeviceRoutines.pptx.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25 , 2011 DeviceRoutines.pptx' - cyma


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Device Routines

and device variables

  • These notes will introduce:
  • Declaring routines that are be executed on device and on the host
  • Declaring local variable on device

ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25, 2011

DeviceRoutines.pptx

slide2

CUDA extensions to declare kernel routines

Host = CPU Device = GPU

__global__ indicates routine can only be called from host and only executed on device

__device__ indicates routine can only be called from device and only executed on device

__host__ indicates routine can only be called from host and only executed on host

(generally only used in combination with __device__ , see later)

Two underscores each

Note cannot call a routine from the kernel to be executed on host

slide3

So far we have seen __global__:

__global__ void add(int *a,int *b, int *c) {

inttid = blockIdx.x * blockDim.x + threadIdx.x;

if(tid < N) c[tid] = a[tid]+b[tid];

}

int main(intargc, char *argv[]) {

int T = 10, B = 1; // threads per block and blocks per grid

int a[N],b[N],c[N];

int *dev_a, *dev_b, *dev_c;

cudaMalloc((void**)&dev_a,N * sizeof(int));

cudaMalloc((void**)&dev_b,N * sizeof(int));

cudaMalloc((void**)&dev_c,N * sizeof(int));

cudaMemcpy(dev_a, a , N*sizeof(int),cudaMemcpyHostToDevice);

cudaMemcpy(dev_b, b , N*sizeof(int),cudaMemcpyHostToDevice);

cudaMemcpy(dev_c, c , N*sizeof(int),cudaMemcpyHostToDevice);

add<<<B,T>>>(dev_a,dev_b,dev_c);

cudaMemcpy(c,dev_c,N*sizeof(int),cudaMemcpyDeviceToHost);

cudaFree(dev_a);

cudaFree(dev_b);

cudaFree(dev_c);

cudaEventDestroy(start);

cudaEventDestroy(stop);

return 0;

}

__global__ must have void return type. Why?

Executed on device

Called from host

Note __global__ asynchronous.

Returns before complete

slide4

Routines to be executed on device

Generally cannot call C library routines from device!

However CUDA has math routines for device that are equivalent to standard C math routines with the same names, so in practice can call math routines such as sin(x) – need to check CUDA docs* before use

Also CUDA has GPU-only routines implemented, faster less accurate (have __ names)*

* See NVIDIA CUDA C Programming Guide for more details

slide5

__device__ routines

__global__ void gpu_sort (int *a, int *b, int N) {

swap (&list[m],&list[j]);

}

__device__ void swap (int *x, int *y) {

int temp;

temp = *x;

*x = *y;

*y = temp;

}

int main (intargc, char *argv[]) {

gpu_sort<<< B, T >>>(dev_a, dev_b, N);

return 0;

}

Recursion is possible with __device __ routines so far as I can tell

slide6

Routines executable on both host and device

__device__ and __host__ qualifiers can be used together

Then routine callable and executable on both host and device. Routine will be compiled for both.

Feature might be used to create code that optionally uses a GPU or for test purposes.

Generally will need statement s that differentiate between host and device

Note: __global__ and __host__ qualifiers cannot be used together

slide7

__CUDA_ARCH__ macro

Indicates compute capability of GPU being used.

Can be used to create different paths thro device code for different capabilities.

__CUDA_ARCH__ = 100 for 1.0 compute capability

__CUDA_ARCH__ = 110 for 1.1 compute capability

slide8

Example

__host__ __device__ func() {

#ifdef __CUDA_ARCH__

… // Device code

#else

… // Host code

#endif

}

Could also select specific compute capabilities

slide10

#include <stdio.h>

#include <stdlib.h>

intcpuA[10];

...

void clearArray() {

for (inti = 0; i < 10; i++)

cpuA[i] = 0;

}

void setArray(int n) {

for (inti = 0; i < 10; i++)

cpuA[i] = n;

}

int main(intargc, char *argv[]) {

clearArray();

setArray(N);

return 0;

}

Local variables on host

In C, scope of a variable is block it is declared in, which does not extend to routines called from block.

If scope is to include main and all within it, including called routines, place declaration outside main:

slide11

Declaring local kernel variables

#include <stdio.h>

#include <stdlib.h>

__device__ intgpu_A[10];

...

__global__ void clearArray() {

for (inti = 0; i < 10; i++)

gpuA[i] = 0;

}

int main(intargc, char *argv[]) {

clearArray();

setArray(N);

return 0;

}

Declaring variable outside main but use __device__ keyword

(now used as a variable type qualifier rather than function type qualifier)

Without further qualification, variable is in global (GPU) memory.

Accessible by all threads

slide12

Accessing kernel variables from host

  • Accessible by host using:
    • cudaMemcpyToSymbol(),
    • cudaMemcpyFromSymbol(),…
  • where name of variable given as an argument:

int main(intargc, char *argv[]) {

intcpuA[10]:

cudaMemcpyFromSymbol(&cpuA, "gpuA", sizeof(cpuA), 0, cudaMemcpyDeviceToHost);

return 0;

}

slide13

Example of both local host and device variables

#include <stdio.h>

#include <cuda.h>

#include <stdlib.h>

intcpu_hist[10]; // globally accessible on cpu

// histogram computed on cpu

__device__ intgpu_hist[10]; // globally accessible on gpu

// histogram computed on gpu

void cpu_histogram(int *a, int N) {

}

__global__ void gpu_histogram(int *a, int N) {

}

int main(intargc, char *argv[]) {

gpu_histogram<<<B,T>>>(dev_a,N);

cpu_histogram(a,N);

return 0;

}