1 / 17

Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems

Rich Miler – www.datacenterknowledge.com. Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems. Tayler H. Hetherington ɣ Timothy G. Rogers ɣ Lisa Hsu* Mike O’Connor* Tor M. Aamodt ɣ ɣ UBC *AMD. University of British Columbia

belisma
Download Presentation

Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rich Miler – www.datacenterknowledge.com Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems Tayler H. Hetheringtonɣ Timothy G. Rogersɣ Lisa Hsu* Mike O’Connor* Tor M. Aamodtɣ ɣUBC*AMD University of British Columbia In Proc. 2012 ACM/IEEE Int’l Symp. On Performance Analysis of Systems and Software (ISPASS)

  2. Bruno Giussani – ww.wired.com Motivation New types of workloads • Non-HPC • Server applications Server applications • Memcached Programmer’s initial intuition into an application’s behavior Server farms require a lot of power • Need for efficient, cost-effective solutions • GPU/APUs Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  3. BackgroundMemcached *Slide from HPCA-18, 2012 Facebook Keynote, Sanjeev Kumar Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  4. Irregular control flow • Irregular memory access patterns • Large memory requirements • Highly input data dependent Memcached - Compatible with GPU? Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  5. Porting MemcachedSimple key-value lookup Return Hit/Miss Key Comparison • READ (GET) requests on GPU • WRITE (SET) requests on CPU Server2 Hash chaining Memory Hash GET Miss Hit Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  6. Porting Memcached - Batching Servern Return Hit/Miss Return Hit/Miss Return Hit/Miss Return Hit/Miss Return Hit/Miss Key Comparison Key Comparison Key Comparison Key Comparison Key Comparison Server2 Server2 Server2 Hash chaining Hash chaining Hash chaining Hash chaining Hash chaining Memory Memory Memory Memory Memory Hash Hash Hash Hash Hash GET GET GET GET GET Miss Miss Miss Miss Miss Hit Hit Hit Hit Hit Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  7. Main Goals • Increase request throughput • Keep request latency reasonable • Main Challenges • Irregular memory access patterns • Irregular control flow • Data transfer overheads Porting Memcached Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  8. Hardware • AMD Radeon HD 5870 (Discrete) • AMD Llano A8-3850 (Fusion) • AMD Zacate E-350 (Fusion) • Simulators • GPGPU-Sim v3.x • In-house GPU control flow simulator • Testing and Simulation • Traces of Wikipedia accesses Methodology Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  9. One request per work item • Data accesses for GET requests are input data dependent • Data can be anywhere in memory • Poor performance on GPU? Porting MemcachedMemory Access Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  10. Porting MemcachedMemory Divergence Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  11. Recall the control flow graph Many branch outcomes are input data dependent Porting MemcachedControl Flow Work item ID 1 – 2 – 3 – 4 – 5 3 – 4 1 – 2 – 5 1 – 5 2 3 – 4 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  12. Porting MemcachedControl Flow 29% 51% 62% Overall 15% 40% Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  13. Dynamic memory manager Transfer memory regions to device Virtual addresses different on host and device Porting MemcachedData Management Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  14. Fusion Systems • Physical shared memory region between host and device • Zero-copy data • Discrete Systems • Possible transfer reduction techniques • Reduction in unnecessary transfers • Acyclic data transfers (Overlap comm. with comp.) • Automatic data transfer frameworks Porting MemcachedData Transfer Reduction Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  15. Porting Memcached Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  16. ResultsRadeon HD 5870 • ~8000 requests yields highest ratio of throughput to latency Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

  17. Rich Miler – www.datacenterknowledge.com Programmer intuition doesn’t always paint the whole picture We exploited the available parallelism on GPUs by batching requests, showing a 7.5X performance increase on the Llano system Data transfer overheads can have a large impact on overall performance Thank you – Questions? Summary Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

More Related