Measuring Performance of Constant Memory Experiment

Measuring Performance of Constant Memory These notes will introduce: Results of an experiment using constant memory ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, March 3, 2011 ConstantMemTiming.ppt

Program The test program simply adds two vectors A and B together to produce a third vector, C One version uses constant memory for A and B Another version uses regular global memory for A and B Note maximum available for constant memory on the GPU (all compute capabilities so far) is 64 Kbytes total.

Code Array declarations #define N 8192 // max size allowed for two vectors in const. mem // Constants held in constant memory __device__ __constant__ int dev_a_Cont[N]; __device__ __constant__ int dev_b_Cont[N]; // regular global memory for comparison __device__ int dev_a[N]; __device__ int dev_b[N]; // result in device global memory __device__ int dev_c[N];

// kernel routines __global__ void add_Cont() { // using constant memory int tid = blockIdx.x * blockDim.x + threadIdx.x; if(tid < N){ dev_c[tid] = dev_a_Cont[tid] + dev_b_Cont[tid]; } } __global__ void add() { //not using constant memory int tid = blockIdx.x * blockDim.x + threadIdx.x; if(tid < N){ dev_c[tid] = dev_a[tid] + dev_b[tid]; } }

/*----------- GPU using constant memory ------------------------*/ printf("GPU using constant memory\n"); for(int i=0;i<N;i++) { // load arrays with some numbers a[i] = i; b[i] = i*2; } // copy vectors to constant memory cudaMemcpyToSymbol(dev_a_Cont,a,N*sizeof(int),0,cudaMemcpyHostToDevice); cudaMemcpyToSymbol(dev_b_Cont,b,N*sizeof(int),0,cudaMemcpyHostToDevice); cudaEventRecord(start, 0); // start time add_Cont<<<B,T>>>(); // does not need array ptrs cudaThreadSynchronize(); // wait for all threads to complete cudaEventRecord(stop, 0); // end time cudaMemcpyFromSymbol(a,"dev_a_Cont",N*sizeof(int),0,cudaMemcpyDeviceToHost); cudaMemcpyFromSymbol(b,"dev_b_Cont",N*sizeof(int),0,cudaMemcpyDeviceToHost); cudaMemcpyFromSymbol(c,"dev_c",N*sizeof(int),0,cudaMemcpyDeviceToHost); cudaEventSynchronize(stop); cudaEventElapsedTime(&elapsed_time_Cont, start, stop); Watch for this zero. I missed it off and it took some time to spot Missed originally

/*----------- GPU not using constant memory ------------------------*/ printf("GPU using constant memory\n"); for(int i=0;i<N;i++) { // load arrays with some numbers a[i] = i; b[i] = i*2; } // copy vectors to constant memory cudaMemcpyToSymbol(dev_a_Cont,a,N*sizeof(int),0,cudaMemcpyHostToDevice); cudaMemcpyToSymbol(dev_b_Cont,b,N*sizeof(int),0,cudaMemcpyHostToDevice); cudaEventRecord(start, 0); // start time add<<<B,T>>>(); // does not need array ptrs cudaThreadSynchronize(); // wait for all threads to complete cudaEventRecord(stop, 0); // end time cudaMemcpyFromSymbol(a,"dev_a_Cont",N*sizeof(int),0,cudaMemcpyDeviceToHost); cudaMemcpyFromSymbol(b,"dev_b_Cont",N*sizeof(int),0,cudaMemcpyDeviceToHost); cudaMemcpyFromSymbol(c,"dev_c",N*sizeof(int),0,cudaMemcpyDeviceToHost); cudaEventSynchronize(stop); cudaEventElapsedTime(&elapsed_time, start, stop);

Speedup around 1.2 after first launch (20%) 1st launch, 1.6 2nd run, 1.217 3rd run, 1.225

Questions

Measuring Performance of Constant Memory Experiment

Measuring Performance of Constant Memory Experiment

Presentation Transcript

Introduction to Programming and Visual C++

What Does It Mean to be ‘Spiritual’?

Integer Programming, Goal Programming, and Nonlinear Programming

Jenks Shotgun Passing Game

Data for Student Success MACUL March, 2011

Parallelizing and Optimizing Programs for GPU Acceleration using CUDA

Algebra of Concurrent Programming

Optimizations Techniques for GPU Computing

Programming Languages

Parallel Concept and Hardware Architecture CUDA Programming Model Overview

CUDA Lecture 4 CUDA Programming Basics

Wednesday March 30 , 2011

Lecture 27 March 9, 2011 Cuprates, metals

Functional Programming

CSEP505: Programming Languages Lecture 1: Intro; Caml; Functional Programming

Introduction to CUDA (2 of 2)

Overview of C Programming Language

Collaborative Programming

03-60-440: Principles of Programming Languages Classification of programming languages

CONSTRAINT LOGIC PROGRAMMING