1 / 31

K F U S I O N Simple Annotations for Optimized Data Flow

K F U S I O N Simple Annotations for Optimized Data Flow. Liam Kiemele, Celina Berg, Aaron Gulliver, Yvonne Coady University of Victoria with thanks to Tim Mattson, Andrew Brownsword (Intel). Road Map. KFusion at work Motivation KFusion Costs and benefits a nnotations, lines of code

holt
Download Presentation

K F U S I O N Simple Annotations for Optimized Data Flow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. K F U S I O NSimple Annotations for Optimized Data Flow Liam Kiemele, Celina Berg, Aaron Gulliver, Yvonne Coady University of Victoria with thanks to Tim Mattson, Andrew Brownsword (Intel)

  2. Road Map • KFusion at work • Motivation • KFusion • Costs and benefits • annotations, lines of code • modularity, performance • Future work and conclusion • explicit composition of computation around data flow IWOCL 2013 Kiemele

  3. Parallel Hardware IWOCL 2013 Kiemele

  4. Good News and Bad News… • Parallelism • Added complexity • Optimization • Memory and Bandwidth • Modularity: Let’s talk Libraries • Details behind an API • Optimize data access (prefetching, caching…) • Better separation of concerns IWOCL 2013 Kiemele

  5. OpenCL Libraries • OpenCL (Computing Language), for CPUs and GPUs • At the heart of any given library will be kernels • Suppose we build an OpenCL Linear Algebra Library __kernelvoid add_vectors(__global float* sum, __global float* v1, __global float* v2) { inti = get_global_id(0); sum[i] = v1[i] + v2[i];} IWOCL 2013 Kiemele

  6. What you get… c = sqrt(add(square(x), square(y)); square square add sqrt IWOCL 2013 Kiemele

  7. What you get… c = sqrt(add(square(x), square(y)); IWOCL 2013 Kiemele

  8. What you WANT! c = sqrt(add(square(x), square(y)); x y add sqrt IWOCL 2013 Kiemele

  9. What you WANT! c = sqrt(add(square(x), square(y)); IWOCL 2013 Kiemele

  10. Two Choices • Modular Implementation • Reusable • Easy to maintain and develop • Individual Kernel optimization • Monolithic Implementation • Performance • Allows for optimizations which will otherwise exist between modules • Can we do both? IWOCL 2013 Kiemele

  11. Introducing KFusion Application File Library File Kernel File square(…) float* square kernel square square(…) add(…) float* add … kernel add … sqrt(…) float* sqrt … kernel sqrt … IWOCL 2013 Kiemele 11

  12. After KFusion… Application File Library File Kernel File square(…) void square … kernel square square(…) add(…) void add … kernel add … sqrt(…) void sqrt … kernel sqrt … New Call:c = fu(…); New Function: float* fu(…) New Kernel: kernel fu(…) IWOCL 2013 Kiemele 12

  13. It works! IWOCL 2013 Kiemele

  14. Road Map • KFusion at work • what and how • …why! • Costs and benefits • annotations, lines of code • modularity, performance • Future work and conclusion • explicit composition of computation around data flow IWOCL 2013 Kiemele

  15. Costs • Annotations • application hints • library synchronization • kernel data flow for compositions • Preprocessor • build dependency graph • source-to-source transformation • loop fusion • deforestation IWOCL 2013 Kiemele

  16. Annotations application #pragma start fuse square(x,x) square(y,y) add(c,x,y) sqrt(c, c) c = sqrt(add(, square(y));#pragma end fuse #pragma sync out public void dot_product(double result, vector x); #pragma sync in public void matrix_vector_mult(vector b, Matrix A, vector x) Library IWOCL 2013 Kiemele

  17. Annotations __kernel void add_vectors(__global float* sum, __global float* v1, __global float* v2) {#pragma kload { inti = get_global_id(0); float arg1 = v1[i]; float arg2 = v2[i]; float s;} s = arg1 + arg2;#pragma kstore{sum[i] = s;}} kernel add IWOCL 2013 Kiemele

  18. Dependency Graph y x square(x) square(y) add(c,x,y) sqrt(c) c IWOCL 2013 Kiemele

  19. Transformation… y x square(x) square(y) add_sqrt(c,x,y) c IWOCL 2013 Kiemele

  20. Replacement Kernel! y x fu(c,x,y) c IWOCL 2013 Kiemele

  21. Annotations AOSD 2013 Kiemele

  22. Benefits IWOCL 2013 Kiemele

  23. Performance IWOCL 2013 Kiemele

  24. Performance IWOCL 2013 Kiemele

  25. Roofline Analysis of Performance • Peak Actual GFlops =minimum(Bandwidth x flops/byte, Peak Performance) • Three Linear Algebra Scenarios • c = sqrt(a2 + b2) • d = sqrt( (x1 – x2)2 + (y1 – y2)2) • Start of conjugate gradient • r = Ax – b • p = r • R2 = r*r AOSD 2013 Kiemele

  26. c = sqrt(a2+ b2) IWOCL 2013 Kiemele

  27. d = sqrt((x1 – x2)2+ (y1 – y2)2) IWOCL 2013 Kiemele

  28. Conjugate Gradient IWOCL 2013 Kiemele

  29. Road Map • KFusion at work • what and how • …why! • Costs and benefits • annotations, lines of code • modularity, performance • Future work and conclusion • explicit composition of computation around data flow AOSD 2013 Kiemele

  30. Future Work kfuse{calls} __kernel void k(…) {kload{ … }computationkstore{… }} • Tools • comprehension and visualization • emulation • performance testing • Combine with other approaches • Optimizing compiles • Code Generators IWOCL 2013 Kiemele

  31. Conclusion • KFusion is a first step towards • explicit, flexible control • Allowing optimizations between modules • separation of concerns • github.com/4Liamk/KFusion/wiki IWOCL 2013 Kiemele

More Related