1 / 6

Programming With CUDA

Chris Kerkhoff Matthew Sullivan 12/2/2009. Programming With CUDA. Basic Flow. The host computer initializes an array with data. The array is copied from the main memory to the memory on the GPU. The GPU performs operations on the array.

kuame-simon
Download Presentation

Programming With CUDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chris Kerkhoff Matthew Sullivan 12/2/2009 Programming With CUDA

  2. Basic Flow • The host computer initializes an array with data. • The array is copied from the main memory to the memory on the GPU. • The GPU performs operations on the array. • The array is copied back to the main memory on the computer.

  3. #include<stdio.h> #include<cuda.h> // Kernel that executes on the CUDA device: __global__ voidcube_array(float *a, int N) {intidx = blockIdx.x * blockDim.x + threadIdx.x; if (idx<N) a[idx] = a[idx] * a[idx] * a[idx]; } int main(void) {// main routine that executes on the host float *a_h, *a_d; // Pointer to host & device arrays constint N = 10; // Number of elements in arrays size_t size = N * sizeof(float); a_h = (float *)malloc(size); // Allocate array on host cudaMalloc((void **) &a_d, size); // Allocate array on device // Initialize host array and copy it to CUDA device: for (inti=0; i<N; i++) a_h[i] = (float)i; cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice); // Do calculation on device: intblock_size = 4; intn_blocks = N/block_size + (N%block_size == 0 ? 0:1); cube_array <<< n_blocks, block_size >>> (a_d, N); // Retrieve result from device and store it in host array: cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost); for (inti=0; i<N; i++) printf("%d %f\n", i, a_h[i]); //Print results free(a_h); cudaFree(a_d);}//Cleanup

  4. 0 0.000000 1 1.000000 2 8.000000 3 27.000000 4 64.000000 5 125.000000 6 216.000000 7 343.000000 8 512.000000 9 729.000000 Press any key to continue . . .

  5. Only if you have an Nvidia card

More Related