1 / 29

CUDA Lecture 5 CUDA at the University of Akron

CUDA Lecture 5 CUDA at the University of Akron. Prepared 6/23/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron. Overview: CUDA Equipment. Your own PCs running G80 emulators Better debugging environment Sufficient for the first couple of weeks

jemma
Download Presentation

CUDA Lecture 5 CUDA at the University of Akron

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CUDA Lecture 5CUDA at the University of Akron Prepared 6/23/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

  2. Overview: CUDA Equipment • Your own PCs running G80 emulators • Better debugging environment • Sufficient for the first couple of weeks • Your own PCs with a CUDA-enabled GPU • NVIDIA boards in department • GeForce family of processors for high-performance gaming • Tesla C2070 for high-performance computing – no graphics output (?) and more memory CUDA at the University of Akron – Slide 2

  3. Summary: NVIDIA Technology CUDA at the University of Akron – Slide 3

  4. Hardware View, Consumer Procs. • Basic building block is a “streaming multiprocessor” • different chips have different numbers of these SMs: CUDA at the University of Akron – Slide 4

  5. Hardware View, 2nd Generation • Basic building block is a “streaming multiprocessor” with • 8 cores, each with 2048 registers • up to 128 threads per core • 16KB of shared memory • 8KB cache for constants held in device memory • different chips have different numbers of these SMs: CUDA at the University of Akron – Slide 5

  6. Hardware View, Fermi • each streaming multiprocessor has • 32 cores, each with 1024 registers • up to 48 threads per core • 64KB of shared memory / L1 cache • 8KB cache for constants held in device memory • there’s also a unified 384KB L2 cache • different chips again have different numbers of SMs: CUDA at the University of Akron – Slide 6

  7. Different Compute Capabilities CUDA at the University of Akron – Slide 7

  8. Different Compute Capabilities CUDA at the University of Akron – Slide 8

  9. Common Technical Specifications CUDA at the University of Akron – Slide 9

  10. Different Technical Specifications CUDA at the University of Akron – Slide 10

  11. Different Technical Specifications CUDA at the University of Akron – Slide 11

  12. Different Technical Specifications CUDA at the University of Akron – Slide 12

  13. Overview: CUDA Components • CUDA (Compute Unified Device Architecture) is NVIDIA’s program development environment: • based on C with some extensions • C++ support increasing steadily • FORTRAN support provided by PGI compiler • lots of example code and good documentation – 2-4 week learning curve for those with experience of OpenMP and MPI programming • large user community on NVIDIA forums CUDA at the University of Akron – Slide 13

  14. Overview: CUDA Components • When installing CUDA on a system, there are 3 components: • driver • low-level software that controls the graphics card • usually installed by sys-admin • toolkit • nvcc CUDA compiler • some profiling and debugging tools • various libraries • usually installed by sys-admin in /usr/local/cuda CUDA at the University of Akron – Slide 14

  15. Overview: CUDA Components • SDK • lots of demonstration examples • a convenient Makefile for building applications • some error-checking utilities • not supported by NVIDIA • almost no documentation • often installed by user in own directory CUDA at the University of Akron – Slide 15

  16. Accessing the Tesla Card • Remotely access the front end: ssh tesla.cs.uakron.edu • ssh sends your commands over an encrypted stream so your passwords, etc., can’t be sniffed over the network CUDA at the University of Akron – Slide 16

  17. Accessing the Tesla Card • The first time you do this: • After login, run /root/gpucomputingsdk_3.2.16_linux.run and just take the default answers to get your own personal copy of the SDK. • Then:cd ~/NVIDIA_GPU_Computing_SDK/C make -j12 -kwill build all that can be built. CUDA at the University of Akron – Slide 17

  18. Accessing the Tesla Card • The first time you do this: • Binaries end up in:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release • In particular header file <cutil_inline.h>is in ~/NVIDIA_GPU_Computing_SDK/C/common/inc • Can then get a summary of technical specs and compute capabilities by executing ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery CUDA at the University of Akron – Slide 18

  19. CUDA Makefile • Two choices: • use nvcc within a standard Makefile • use the special Makefile template provided in the SDK • The SDK Makefile provides some useful options: • make emu=1 • uses an emulation library for debugging on a CPU • make dbg=1 • activates run-time error checking • In general just use a standard Makefile CUDA at the University of Akron – Slide 19

  20. Sample Tesla Makefile CUDA at the University of Akron – Slide 20

  21. Compiling a CUDA Program • Parallel Thread Execution (PTX) • Virtual machine and ISA • Programming model • Execution resources and state CUDA Tools and Threads – Slide 2

  22. Compilation • Any source file containing CUDA extensions must be compiled with NVCC • NVCC is a compiler driver • Works by invoking all the necessary tools and compilers like cudacc, g++, cl, … • NVCC outputs • C code (host CPU code) • Must then be compiled with the rest of the application using another tool • PTX • Object code directly, or PTX source interpreted at runtime CUDA Tools and Threads – Slide 22

  23. Linking • Any executable with CUDA code requires two dynamic libraries • The CUDA runtime library (cudart) • The CUDA core library (cuda) CUDA Tools and Threads – Slide 23

  24. Debugging Using the Device Emulation Mode • An executable compiled in device emulation mode (nvcc –deviceemu) runs completely on the host using the CUDA runtime • No need of any device and CUDA driver • Each device thread is emulated with a host thread CUDA Tools and Threads – Slide 24

  25. Debugging Using the Device Emulation Mode • Running in device emulation mode, one can • Use host native debug support (breakpoints, inspection, etc.) • Access any device-specific data from host code and vice-versa • Call any host function from device code (e.g. printf) and vice-versa • Detect deadlock situations caused by improper usage of __syncthreads CUDA Tools and Threads – Slide 25

  26. Device Emulation Mode Pitfalls • Emulated device threads execute sequentially, so simultaneous access of the same memory location by multiple threads could produce different results • Dereferencing device pointers on the host or host pointers on the device can produce correct results in device emulation mode, but will generate an error in device execution mode CUDA Tools and Threads – Slide 26

  27. Floating Point • Results of floating-point computations will slightly differ because of • Different compiler outputs, instructions sets • Use of extended precision for intermediate results • There are various options to force strict single precision on the host CUDA Tools and Threads – Slide 27

  28. Nexus • New Visual Studio Based GPU Integrated Development • http://developer.nvidia.com/object/nexus.html • Available in Beta (as of October 2009) CUDA Tools and Threads – Slide 28

  29. End Credits • Based on original material from • http://en.wikipedia.com/wiki/CUDA, accessed 6/22/2011. • The University of Akron: Charles Van Tilburg • The University of Illinois at Urbana-Champaign • David Kirk, Wen-mei W. Hwu • Oxford University: Mike Giles • Stanford University • Jared Hoberock, David Tarjan • Revision history: last updated 6/23/2011. CUDA at the University of Akron – Slide 29

More Related