1 / 21

The Energy Case for Graph Processing on Hybrid Platforms

The Energy Case for Graph Processing on Hybrid Platforms. Abdullah Gharaibeh , Lauro Beltrão Costa, Elizeu Santos-Neto and Matei Ripeanu NetSysLab The University of British Columbia http://netsyslab.ece.ubc.ca. Graphs are Everywhere . 1B users 150B friendships. 1.4B pages, 6.6B links.

berget
Download Presentation

The Energy Case for Graph Processing on Hybrid Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Energy Case for Graph Processing on Hybrid Platforms Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto and Matei Ripeanu NetSysLab The University of British Columbia http://netsyslab.ece.ubc.ca

  2. Graphs are Everywhere • 1B users • 150B friendships • 1.4B pages, 6.6B links

  3. Challenges and Opportunities • CPUs • Poor locality • Caches + summary data structures • Data-dependent memory access patterns • Low compute-to-memory access ratio • as large as 1TB • Large memory footprint • Varying degrees of parallelism • (both intra- and inter- stage)

  4. Challenges and Opportunities • GPUs • CPUs • Poor locality • Caches + summary data structures • Caches + summary data structures • Data-dependent memory access patterns • Massive hardware multithreading • Low compute-to-memory access ratio • as large as 1TB • Large memory footprint • up to 12GB! • Assemble a • hybrid platform • Varying degrees of parallelism • (both intra- and inter- stage)

  5. Past Work • Performance Modeling • Predicts speedup • Intuitive Totem • A graph processing engine for hybrid systems • Applies algorithm-agnostic optimizations Partitioning Strategies • Workload to processor matchmaking • A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing, Gharaibeh et al., PACT 2012 • On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest, Gharaibeh et al., IPDPS 2013 Main outcome: hybrid platforms enable significant performance gains • 5

  6. Past Work • Performance Modeling • Predicts speedup • Intuitive Totem • A graph processing engine for hybrid systems • Applies algorithm-agnostic optimizations Partitioning Strategies • Workload to processor matchmaking • A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing, Gharaibeh et al., PACT 2012 Focused on time to solution as a success metric • On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest, Gharaibeh et al., IPDPS 2013 Main outcome: hybrid platforms enable significant performance gains • 6

  7. Motivating Question Is it energyefficient to use GPU-acceleratedplatforms for large-scalegraph processing?

  8. Evaluation Platform GPU has double TDP S = CPU Socket G = GPU Power is measured at the wall AC outlet

  9. Evaluation Platform GPUs are power hungry at peak utilization At peak utilization, RAM consumes as much as dual CPUs! The CPU is relatively power efficient at peak utilization! High idle power is due to large DRAM space S = CPU Socket G = GPU Power is measured at the wall AC outlet

  10. Challenges and Opportunities • GPU draws significant amount of power • The workload is Irregular and memory-bound • GPU has low idle power (25W) • Offloading to GPU enables faster “race-to-idle”

  11. Evaluation Study • Workloads • Real and synthetic • Large: can not fit on GPU memory • Benchmarks • Breadth-First Search (BFS) • PageRank • Metrics • Raw performance (TEPS) • Raw power (Watts) • Power normalized by processing rate (Watts/TEPS) • 11

  12. Raw Performance GPU-offloading is useful for large graphs Performance scales with more processors 1S1G > 2S S = CPU Socket G = GPU

  13. Power Consumption BFS 1S1G ≤ 2S!

  14. Power Consumption BFS load imbalance  more variability

  15. Power Consumption BFS PageRank PageRank draws more power

  16. Normalizing by Processing Rate In most cases, 1S1G > 2S

  17. Normalizing by Processing Rate Energy efficiency scales with more processors

  18. Normalizing by Processing Rate Similar results on PageRank

  19. Conclusions • A hybrid configuration is more energy and power efficient than a symmetric one • A “race-to-idle” strategy leads to better energy efficiency • RAM is a major power consuming component

  20. Questions • code@: netsyslab.ece.ubc.ca

  21. Energy-Delay Product (EDP) Higher relative advantage

More Related