1 / 30

GMAC Global Memory for Accelerators

GMAC Global Memory for Accelerators. Isaac Gelado PUMPS Summer School - Barcelona. Vector Addition CUDA code. Vector addition Really simple kernel code But, what about the CPU code? GMAC is a complement to the CUDA run-time Simplifies the CPU code Exploits advanced CUDA features for free.

cisco
Download Presentation

GMAC Global Memory for Accelerators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GMACGlobal Memory for Accelerators Isaac Gelado PUMPS Summer School - Barcelona

  2. Vector Addition CUDA code • Vector addition • Really simple kernel code • But, what about the CPU code? • GMAC is a complement to the CUDA run-time • Simplifies the CPU code • Exploits advanced CUDA features for free __global__ void vector(float *c, float *a, float *b, size_t size) { int idx = threadIdx.x + blockIdx.x * blockDim.x; if(idx < size) c[idx] = a[idx] + b[idx]; } PUMPS Summer School

  3. Some easy CUDA code (I) • Read from disk, transfer to GPU and compute intmain(intargc, char *argv[]) { float *h_a, *h_b, *h_c, *d_a, *d_b, *d_c; size_t size = LENGTH * sizeof(float); assert((h_a = malloc(size) != NULL); assert((h_b = malloc(size) != NULL); assert((h_c = malloc(size) != NULL); assert(cudaMalloc((void **)&d_a, size) == cudaSuccess)); assert(cudaMalloc((void **)&d_b, size) == cudaSuccess)); assert(cudaMalloc((void **)&d_c, size) == cudaSuccess)); read_file(argv[A], h_a); read_file(argv[B], h_b); assert(cudaMemcpy(d_a, h_a, size, cudaMemcpyHostToDevice) == cudaSuccess); assert(cudaMemcpy(d_b, h_b, size, cudaMemcpyHostToDevice) == cudaSuccess); PUMPS Summer School

  4. Some easy CUDA code (and II) • Read from disk, transfer to GPU and compute Db(BLOCK_SIZE); Dg(LENGTH / BLOCK_SIZE); if(LENGTH % BLOCK_SIZE) Dg.x++; vector<<<Dg, Db>>>(d_c, d_a, d_b, LENGTH); assert(cudaThreadSynchronize() == cudaSuccess); assert(cudaMemcpy(d_c, h_c, LENGTH * sizeof(float), cudaMemcpyDeviceToHost) == cudaSuccess); save_file(argv[C], h_c); free(h_a); cudaFree(d_a); free(h_b); cudaFree(d_b); free(h_c); cudaFree(d_c); return 0; } PUMPS Summer School

  5. Some really easy GMAC code intmain(intargc, char *argv[]) { float *a, *b, *c; size_t size = LENGTH * sizeof(float); assert(gmacMalloc((void **)&a, size) ==gmacSuccess)); assert(gmacMalloc((void **)&b, size) ==gmacSuccess)); assert(gmacMalloc((void **)&c, size) ==gmacSuccess)); read_file(argv[A], a); read_file(argv[B],b); Db(BLOCK_SIZE); Dg(LENGTH / BLOCK_SIZE); if(LENGTH % BLOCK_SIZE) Dg.x++; vector<<<Dg, Db>>>(c, a, b, LENGTH); assert(gmacThreadSynchronize() == gmacSuccess); save_file(argv[C], c); gmacFree(a); gmacFree(b); gmacFree(c); return 0; } There is no memory copy There is no memory copy PUMPS Summer School

  6. Getting GMAC • GMAC is at http://adsm.googlecode.com/ • Debian / Ubuntu binary and development .deb files • UNIX (also MacOS X) source code package • Experimental versions from mercurial repository PUMPS Summer School

  7. Outline • Introduction • GMAC Memory Model • Asymmetric Memory • Global Memory • GMAC Execution Model • Multi-threading • Inter-thread communication • Conclusions PUMPS Summer School

  8. GMAC Memory Model • Unified CPU / GPU virtual address space • Asymmetric address space visibility Shared Data Memory CPU GPU CPU Data PUMPS Summer School

  9. GMAC Consistency Model • Implicit acquire / release primitives at accelerator call / return boundaries CPU ACC CPU ACC PUMPS Summer School

  10. GMAC Memory API • Memory allocation gmacError_t gmacMalloc(void **ptr, size_t size) • Allocated memory address (returned by reference) • Gets the size of the data to be allocated • Error code, gmacSuccess if no error • Example usage #include <gmac.h> int main(int argc, char *argv[]) { float *foo = NULL; gmacError_t error; if((error = gmacMalloc((void **)&foo, FOO_SIZE)) != gmacSuccess) FATAL(“Error allocating memory %s”, gmacErrorString(error)); . . . } PUMPS Summer School

  11. GMAC Memory API • Memory release gmacError_t gmacFree(void *ptr) • Memory address to be release • Error code, gmacSuccess if no error • Example usage #include <gmac.h> int main(int argc, char *argv[]) { float *foo = NULL; gmacError_t error; if((error = gmacMalloc((void **)&foo, FOO_SIZE)) != gmacSuccess) FATAL(“Error allocating memory %s”, gmacErrorString(error)); . . . gmacFree(foo); } PUMPS Summer School

  12. GMAC Memory API • Memory translation (workaround for Fermi) Void *gmacPtr(void *ptr) template<typename T> T *gmacPtr(T *ptr) • CPU memory address • GPU memory address • Example usage #include <gmac.h> int main(int argc, char *argv[]) { . . . kernel<<<Dg, Db>>>(gmacPtr(buffer), size); . . . } PUMPS Summer School

  13. GMAC Execution Example • Get advanced CUDA features for free • Asynchronous data transfers • Pinned memory PUMPS Summer School

  14. Outline • Introduction • GMAC Memory Model • Asymmetric Memory • Global Memory • GMAC Execution Model • Multi-threading • Inter-thread communication • Conclusions PUMPS Summer School

  15. GMAC Global Memory • Data accessible by all accelerators, but owned by the CPU GPU Memory CPU GPU PUMPS Summer School

  16. GMAC Global memory API • Memory allocation gmacError_t gmacGlobalMalloc(void **ptr, size_t size) • Allocated memory address (returned by reference) • Gets the size of the data to be allocated • Error code, gmacSuccess if no error • Example usage #include <gmac.h> int main(int argc, char *argv[]) { float *foo = NULL; gmacError_t error; if((error = gmacGlobalMalloc((void **)&foo, FOO_SIZE)) != gmacSuccess) FATAL(“Error allocating memory %s”, gmacErrorString(error)); . . . } PUMPS Summer School

  17. Outline • Introduction • GMAC Memory Model • Asymmetric Memory • Global Memory • GMAC Execution Model • Multi-threading • Inter-thread communication • Conclusions PUMPS Summer School

  18. GMAC and Multi-threading • In the past, one thread one CPU • In GMAC, one thread: • One CPU • One GPU • A thread is running in the GPU or the CPU, but not in both at the same time • Create threads using what you already know • pthread_create(...) PUMPS Summer School

  19. GMAC and Multi-threading • Virtual memory accessibility: • Complete address space in CPU code • Partial address space in GPU code Memory CPU CPU GPU GPU PUMPS Summer School

  20. Outline • Introduction • GMAC Memory Model • Asymmetric Memory • Global Memory • GMAC Execution Model • Multi-threading • Inter-thread communication • Conclusions PUMPS Summer School

  21. GPU Passing and Copying • GPU passing: • Send the thread’s virtual GPU to another thread • Do not move data, move computation • API Calls • Virtual GPU sending gmacError_t gmacSend(thread_id dest) • Virtual GPU receiving gmacError_t gmacReceive() • Virtual GPU copying gmacError_t gmacCopy(thread_id dest) PUMPS Summer School

  22. Outline • Introduction • GMAC Memory Model • Asymmetric Memory • Global Memory • GMAC Execution Model • Multi-threading • Inter-thread communication • Conclusions PUMPS Summer School

  23. Conclusions • Single virtual address space for CPUs and GPUs • Use CUDA advanced features • Automatic overlap data communication and computation • Get access to any GPU from any CPU thread • Get more performance from your application more easily • Go: http://adsm.googlecode.com PUMPS Summer School

  24. GMACGlobal Memory for Accelerators Isaac Gelado PUMPS Summer School - Barcelona

  25. Backup Slides

  26. GMAC Unified Address Space • When allocating memory • Allocate accelerator memory • Allocate CPU memory at the same virtual address System Memory Accelerator Memory CPU Accelerator PUMPS Summer School

  27. GMAC Unified Address Space • Use fixed-size segments to map accelerator memory • Implement and export Accelerator Virtual Memory Accelerator Memory Accelerator 0x200100000 0x00100000 Accelerator Memory Accelerator 0x100100000 0x00100000 CPU System Memory PUMPS Summer School

  28. GMAC Data Transfers • Avoid unnecessary data copies • Lazy-update: • Call: transfer modified data • Return: transfer when needed System Memory Accelerator Memory CPU Accelerator PUMPS Summer School

  29. GMAC Data Transfers • Overlap CPU execution and data transfers • Minimal transfer on-demand • Rolling-update: • Memory-block size granularity System Memory Accelerator Memory CPU Accelerator PUMPS Summer School

  30. GMACGlobal Memory for Accelerators Isaac Gelado PUMPS Summer School - Barcelona

More Related