1 / 7

GPU&CUDA Labwork Week3

GPU&CUDA Labwork Week3. Bin ZHOU USTC: Fall 2012. 上机实验. 目的 自主进行基本CUDA程序开发 工具 cuda 4. 2; Linux 为主, windows 为辅 GTX 640 方法 上机实验+教师 讲解. 实验内容. 1)完成课件 Code walk through 1 2)完成课件 code walk through 2 3) 完成基本的 GPU 矩阵相乘 A = B*C 其中 B,C 要求为任意维度 ( 以内存放得下为准 )

kalei
Download Presentation

GPU&CUDA Labwork Week3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU&CUDA Labwork Week3 Bin ZHOU USTC: Fall2012

  2. 上机实验 • 目的 • 自主进行基本CUDA程序开发 • 工具 • cuda 4.2;Linux为主,windows为辅 • GTX 640 • 方法 • 上机实验+教师讲解

  3. 实验内容 1)完成课件 Code walk through 1 2)完成课件 code walk through 2 3) 完成基本的GPU矩阵相乘 A = B*C 其中 B,C要求为任意维度(以内存放得下为准) 与CPU比较结果, 使用一个线程处理一个元素的模式

  4. Attention!!! 1) Code Walk through 1,2 里面由于教师懒惰原因,可能存在多年未改的小bug…… 2) 实验要求完成基本的matrix multiply

  5. Code Walk through 1 #include <stdio.h> int main() { intdimx = 16; intnum_bytes = dimx*sizeof(int); int *d_a=0, *h_a=0; h_a = (int*)malloc(num_bytes); cudaMalloc( (void**)&d_a, num_bytes); if( 0==h_a || 0==d_a ) { printf("couldn't allocate memory\n"); return 1; } cudaMemset( d_a, 0, num_bytes ); cudaMemcpy( h_a, d_a, num_bytes, cudaMemcpyDeviceToHost ); for(inti=0; i<dimx; i++) printf("%d ", h_a[i] ); printf("\n"); free( h_a ); cudaFree( d_a ); return 0; }

  6. Code Walk through 2 #include <stdio.h> __global__ void kernel( int *a ){ intidx=blockIdx.x*blockDim.x+threadIdx.x; a[idx] = 7;} int main() { intdimx = 16; intnum_bytes = dimx*sizeof(int); int *d_a=0, *h_a=0 h_a = (int*)malloc(num_bytes); cudaMalloc( (void**)&d_a, num_bytes ); if( 0==h_a || 0==d_a ){ printf("couldn't allocate memory\n"); return 1; } cudaMemset( d_a, 0, num_bytes ); dim3 grid, block; block.x = 4; grid.x = dimx / block.x; kernel<<<grid, block>>>( d_a ); cudaMemcpy( h_a, d_a, num_bytes, cudaMemcpyDeviceToHost ); for(inti=0; i<dimx; i++) printf("%d ", h_a[i] ); printf("\n"); free( h_a ); cudaFree( d_a ); return 0; }

  7. 自己动手 Matrix Multiply 代码????

More Related