1 / 11

Utilization of GPU’s for General Computing

Utilization of GPU’s for General Computing. Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda , Reiji, et al. Overview. Problem: Want to use the GPU for things other than graphics, however the costs can be high Solution:

Download Presentation

Utilization of GPU’s for General Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji, et al.

  2. Overview • Problem: • Want to use the GPU for things other than graphics, however the costs can be high • Solution: • Improve the CUDA drivers • Results: • As compared to node of a supercomputer, worth it • Conclusion • These improvements make using GPGPU’s more feasible

  3. Problem: Need to computation power • Why GPU’s? • GPU’s are not being fully realized as a resource, often sitting idle when not being used for graphics • Better performance for less power as compared to CPU’s • What’s the issue? Cost. • Efficient scheduling – timing data loads with its uses • Memory management – using the small amount of memory available effectively • Loads and stores – waiting for memory transfers, taking 100’s of cycles

  4. Solutions • Brook+ by AMD, Larrabee by Intel • CUDA by NVIDA • Greatest technological maturity at the time • Paper investigating existing technology and suggested improvements 8 Streaming Processors 30 Multi-Processors 16kb

  5. NVIDA’s Tesla C1060 GPU vs. Hitachi HA8000-tc/RS425 (T2K) Super Computer • T2K – fastest supercomputer in Japan

  6. Issues to Overcome • High SIMD vector length • Small main memory size • High register spill cost • No L2 cachebut rather read-only texture caches

  7. Methods to Hide Away Latency • CUDA compiler option limits number of registers used per warp • 1 warp = the 32 threads running in a block (SMID) • Maximizes number of warps that can run at a time • Could cause spills • Variable-sized multi-round data transfer scheduling with PCI express • PCI express allows for data transfer, GPU and CPU computation to occur in parallel • Allows for constant flow of information: • Allows for up to O(log x/x) as compared to uniform scheduling’s O()

  8. Methods to Hide Away Latency • Computation time between communications > Communication latency • Worth sending the data over to the GPU • Increasing bandwidth and size of messages makes the constant term in overhead latency seem smaller • Efficient use of registers to prevent spills • Deciding what work to do where, GPU vs. CPU, work sharing • Minimizing divergent warps using atomic operations found in CUDA • Divergent warp occur when threads must follow both paths

  9. Results • Variable-sized multi-round data transfer scheduling Number of rounds

  10. Results • Use of atomic instructions in CUDA to minimize latency

  11. Conclusion • CUDA gives programmers the ability to harness the power of the GPU for general uses. • The improvements presented allow this option to be more feasible. • Strategic use of GPGPU’s as a resource will improve speed and efficiency. • However, presented material mainly theoretical, not much strong data to back up • More suggestions than implementations, promoting GPGPU use

More Related