1 / 137

TESTING AND EXPOSING WEAK GPU MEMORY MODELS

TESTING AND EXPOSING WEAK GPU MEMORY MODELS. MS Thesis Defense b y Tyler Sorensen Advisor : Ganesh Gopalakrishnan May 30, 2014. Joint Work with:

violet-yang
Download Presentation

TESTING AND EXPOSING WEAK GPU MEMORY MODELS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TESTING AND EXPOSING WEAK GPU MEMORY MODELS MS Thesis Defense by Tyler Sorensen Advisor : Ganesh Gopalakrishnan May 30, 2014

  2. Joint Work with: Jade Alglave (University College London), Daniel Poetzl (University of Oxford), Luc Maranget (Inria), Alastair Donaldson, John Wickerson,(Imperial College London), Mark Batty (University of Cambridge)

  3. Roadmap • Background and Approach • Prior Work • Testing Framework • Results • CUDA Spin Locks • Bulk Testing • Future Work and Conclusion

  4. Roadmap • Background and Approach • Prior Work • Testing Framework • Results • CUDA Spin Locks • Bulk Testing • Future Work and Conclusion

  5. GPU Background • GPU is a highly parallel co-processor • Currently found in devicesfrom tablets to top supercomputers (Titan) • Not just used for visualization anymore! Images from Wikipedia [16,17,18]

  6. GPU Programming Explicit Hierarchical concurrency model • Thread Hierarchy: • Thread • Warp • CTA (Cooperative Thread Array) • Kernel (GPU program) • Memory Hierarchy: • Shared Memory • Global Memory

  7. GPU Programming

  8. GPU Programming • GPUs are SIMT (Single Instruction, Multiple Thread) • NVIDIA GPUs may be programmed using CUDA or OpenCL

  9. GPU Programming

  10. Weak Memory Models • Consider the test known as Store Buffering (SB)

  11. Weak Memory Models • Consider the test known as Store Buffering (SB) • Initial State: x and y are memory locations

  12. Weak Memory Models • Consider the test known as Store Buffering (SB) • Thread IDs

  13. Weak Memory Models • Consider the test known as Store Buffering (SB) • Program: for each thread ID

  14. Weak Memory Models • Consider the test known as Store Buffering (SB) • Assertion: question about the final state of registers

  15. Weak Memory Models • Consider the test known as Store Buffering (SB) • Can this assertion be satisfied?

  16. Assertion cannot be satisfied by interleavings This is known as sequential consistency (or SC) [1]

  17. Weak Memory Models • Can we assume assertion will never pass?

  18. Weak Memory Models • Can we assume assertion will never pass? No!

  19. Weak Memory Models • Executing this test with the Litmus tool [2] on an Intel i7 x86 processor for 1000000 iterations, we get the following histogram of results:

  20. Weak Memory Models • What Happened? • Architectures implement weak memory models where the hardware is allowed to re-order certain memory instructions. • On x86 architectures, the hardware is allowed to re-order write instructions with program-order later read instructions [3]

  21. GPU Memory Models • What type of memory model do current GPUs implement? • Documentation is sparse • CUDA has 1 page + 1 example [4] • PTX has 1 page + 0 examples [5] • No specifics about which instructions are allowed to be re-ordered • We need to know if we are to write correct GPU programs!

  22. Our Approach • Empirically explore the memory model implemented on deployed NVIDIA GPUs • Achieved by developing a memory model testing tool for NVIDIA GPUs with specialized heuristics • We analyze classic memory model properties and CUDA applications in this framework with unexpected results • We test large families of tests on GPUs as a basis for modeling and bug hunting

  23. Our Approach • Disclaimer: Testing is not guaranteed to reveal all behaviors

  24. Roadmap • Background and Approach • Prior Work • Testing Framework • Results • CUDA Spin Locks • Bulk Testing • Future Work and Conclusion

  25. Prior Work • Testing Memory Models: • Pioneered by Bill Collier in ARCHTEST in 1992 [6] • TSOTool in 2004 [7] • Litmus in 2011 [2] • We extend this tool

  26. Prior Work (GPU Memory Models) • June 2013: • Hower et al. proposed a SC for race-free memory model for GPUs [8] • Sorensen et al. proposed an operational weak GPU memory model based on available documentation [9] • 2014: • Hower et al. proposed two SC for race-free memory model for GPUs, HRF-direct and HRF-indirect [10] It remains unclear what memory model deployed GPUs implement

  27. Roadmap • Background and Approach • Prior Work • Testing Framework • Results • CUDA Spin Locks • Bulk Testing • Future Work and Conclusion

  28. Testing Framework • GPU litmus test

  29. Testing Framework • GPU litmus test • PTX instructions

  30. Testing Framework • GPU litmus test • What memory region (shared or global) are x and y in?

  31. Testing Framework • GPU litmus test • Are T0 and T1 in the same CTA? Or different CTAs?

  32. Testing Framework • We consider three different GPU configurations for tests: • D-warp:S-cta-Shared: Different warp, Same CTA, targeting shared memory • D-warp:S-cta-Global: Different warp, Same CTA, targeting global memory • D-cta:S-ker-Global: Different CTA, Same kernel, targeting global memory

  33. Testing Framework • Given a GPU Litmus test produce executable • CUDA or • OpenCL

  34. Testing Framework • Host (CPU) generated code

  35. Testing Framework • Host (CPU) generated code

  36. Testing Framework • Host (CPU) generated code

  37. Testing Framework • Host (CPU) generated code

  38. Testing Framework • Host (CPU) generated code

  39. Testing Framework • Host (CPU) generated code

  40. Testing Framework • Host (CPU) generated code

  41. Testing Framework • Kernel generated code

  42. Testing Framework • Kernel generated code

  43. Testing Framework • Kernel generated code

  44. Testing Framework • Kernel generated code

  45. Testing Framework • Kernel generated code

More Related