1 / 0

Heterogeneous CPU/GPU co-processor clusters

Heterogeneous CPU/GPU co-processor clusters. Michael Fruchtman. Current State. Eight of the top ten most efficient clusters are heterogeneous [1] Power law of efficiency. Current State. At today’s efficiencies: An exascale (10 18 ) cluster will require 200MegaWatts [2]

tamah
Download Presentation

Heterogeneous CPU/GPU co-processor clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heterogeneous CPU/GPU co-processor clusters

    Michael Fruchtman
  2. Current State Eight of the top ten most efficient clusters are heterogeneous [1] Power law of efficiency
  3. Current State At today’s efficiencies: An exascale (1018) cluster will require 200MegaWatts [2] Cluster efficiency must grow by 66% a year to keep up with Moore’s Law Most efficient cluster increased at normalized 61.4% average per year This gap represents the increase in power requirements to grow from petascale to exascale
  4. Power Efficient Amdahl’s Law [3] Three transitions from P P to P*, P to c*, P+c* Speedup per watt f is fraction of parallel execution N total number of cores P+c* Wc percentage of power draw of c to P Kc percentage of power draw of idle c to active c K power draw of P Scperformance of c relative to P
  5. Power Efficient Amdahl’s Law [3] Given Wc=0.25, Sc=0.5, Kc=0.60 N variable to power budget, K=1 Top: f=0.3 Bottom: f=0.9 P+c* is superior with increased parallelization
  6. GPU Architecture [4]
  7. P-E Amdahl’s Law and GPU Wc = 0.00417, 0.5 watts per core, K=120 Intel i7 980 XE Kc = 0.115 turning on a GPU is 71% of power draw [5] Sc is harder to measure, memory or computation bound? GPU memory architecture makes this difficult to measure. Sc = 0.172 assuming computational with the GTX580
  8. Threads, Blocks and Performance [5]
  9. Formal Power Modeling [6] Average Geometric Error of Power Prediction = 9.18%
  10. Temperature Model [6] RC_Rise = 35 and RC_Decay = 65 GPU dependent constants
  11. Conditions for GPU Use GTX 580 draws 244W on load Speedup must be greater than 2, 3 for safety f must be very high, preferably 0.9 or higher Improved energy efficiency is based on performance Example: GPUDB SQL queries Without joins speedup 20+ [7] With joins 2-7 [8]
  12. Reducing GPU Power Usage Powergating Improved Memory Coalescence Memory Coalescence Models Incoherent Branching Incoherent Branching Models NVIDIA Optimus reduces idle power to near zero
  13. References [1]Feng, Wu-chan and Kirk W. Cameron. "The Green 500 List - November 2010." The Green 500. Virginia Tech and Virginia Polytechnic Institute and State University. November 2010. Web. March 15 2011. [2] T. Agerwala. Challenges on the road to exascale computing. Proceedings of the 22nd annual international conference on Supercomputing (ICS '08). ACM, New York, NY, USA, 2-2. 2008. [3] D. Woo and H-H Lee. Extending Amdahl's Law for Energy-Efficient Computing in the Multi-Core Era. IEEE Xplore. IEEE Computer Society. December 2008. Web. March 15, 2011. [4] R. Smith. "NVIDIA's GeForce GTX 580: Fermi Redefined. AnandTech. November 9, 2010. Web. March 16, 2011. http://www.anandtech.com/show/4008/nvidias-geforce-gtx-580 [5] R. Suda and D. Ren. Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels towards Power Optimized High Performance Computing. International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE Computer Society. pp. 432-438. 2009. [6] S. Hong and H. Kim. An Integrated GPU Power and Performance Model. ISCA '10 Proceedings of the 37th annual international symposium on Computer architecture. ACM, New York, NY, USA. pp. 280-289. 2010. [7] P. Bakkum and K. Skadron. Accelerating SQL Database Operations on a GPU with CUDA. GPGPU '10 Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM, New York, NY, USA. pp. 94-103. B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational Joins on Graphics Processors. SIGMOD '08 Proceeding on the 2008 ACM SIGMOD international conference on Management of data. ACM, New York, NY, USA. pp. 511-524. 2008.
More Related