1 / 86

Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012

Accelerator Compiler for the VENICE Vector Processor. Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012. Outline:. Motivation Background Implementation Results Conclusion. Outline:. Motivation Background Implementation Results Conclusion. FPGA. VHDL. Motivation. Multi-core.

fathia
Download Presentation

Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerator Compiler for the VENICE Vector Processor Zhiduo Liu Supervisor: Guy Lemieux Sep. 28th, 2012

  2. Outline: Motivation Background Implementation Results Conclusion

  3. Outline: Motivation Background Implementation Results Conclusion

  4. FPGA VHDL Motivation Multi-core ParC Cilk Erlang System Verilog Verilog OpenMP OpenCL aJava SSE MPI Bluespec OpenGL Pthread GPU X10 CUDA StreamIt Sh OpenHMPP Many-core Fortress Sponge Chapel … Computer clusters Vector Processor

  5. Simplification FPGA VHDL Motivation Multi-core ParC Cilk Erlang System Verilog Verilog OpenMP OpenCL aJava SSE MPI Bluespec OpenGL Pthread GPU X10 CUDA StreamIt Sh OpenHMPP Many-core Fortress Sponge Chapel … Computer clusters Vector Processor

  6. Motivation Single Description …

  7. Contributions The compiler serves as a new back-end of a single-description multiple-device language. The compiler makes VENICE easier to program and debug. The compiler provides auto-parallelization and optimization. [1] Z. Liu, A. Severance, S. Singh and G. Lemieux, “Accelerator Compiler for the VENICE Vector Processor,” in FPGA 2012. [2] C. Chou, A. Severance, A. Brant, Z. Liu, S. Sant, G. Lemieux, “VEGAS: soft vector processor with scratchpad memory,” in FPGA 2011.

  8. Outline: Motivation Background Implementation Results Conclusion

  9. Complicated ALIGN WR RD ALIGN EX1 EX2 ACCUM

  10. #include "vector.h“ int main() { int A[] = {1,2,3,4,5,6,7,8}; const int data_len = sizeof ( A ); int *va = ( int *) vector_malloc ( data_len ); vector_dma_to_vector ( va, A, data_len ); vector_wait_for_dma (); vector_set_vl ( data_len / sizeof (int) ); vector ( SVW, VADD, va, 42, va ); vector_instr_sync(); vector_dma_to_host ( A, va, data_len ); vector_wait_for_dma (); vector_free (); } Program in VENICE assembly • Allocate vectors in scratchpad • Move data from main memory to scratchpad • Wait for DMA transaction to be completed • Setup for vector instructions • Perform vector computations • Wait for vector operations to be completed • Move data from scratchpad to main memory • Wait for DMA transaction to be completed • Deallocate memory from scratchpad

  11. Program in Accelerator • Create a Target • Create Parallel Array objects • Write expressions • Call ToArray to evaluate expressions • Delete Target object #include "Accelerator.h" using namespace ParallelArrays; using namespace MicrosoftTargets; int main() { int A[] = {1,2,3,4,5,6,7,8}; Target *tgt = CreateVectorTarget(); IPA b = IPA( A, sizeof (A)/sizeof (int)); IPA c = b + 42; tgt->ToArray( c, A, sizeof (A)/sizeof (int)); tgt->Delete(); } Target *tgt = CreateMulticoreTarget(); Target *tgt= CreateDX9Target();

  12. Assembly Programming : Accelerator Programming : Write in Accelerator Write Assembly Compile with Microsoft Visual Studio Doesn’t compile? Or result incorrect? Compile with Gcc Compile with Gcc Doesn’t compile? Download to board Download to board Get Result Get Result Result Incorrect?

  13. Assembly Programming : • Hard to program • Long debug cycle • Not portable • Manual – Not always optimal or correct (wysiwyg) • Accelerator Programming : • Easy to program • Easy to debug • Can also target other devices • Automated compiler optimizations

  14. Outline: Motivation Background Implementation Results Conclusion

  15. D #include "Accelerator.h" using namespace ParallelArrays; using namespace MicrosoftTargets; int main() { Target *tgtVector = CreateVectorTarget(); const int length = 8192; int a[] = {1,2,3,4, … , 8192}; int d[length]; IPA A = IPA( a, length); IPA B = Evaluate( Rotate(A, [1]) + 1 ); IPA C = Evaluate( Abs( A + 2 )); IPA D = ( A + B ) * C ; tgtVector->ToArray( D, d, length * sizeof(int)); tgtVector->Delete(); } × Abs + + A + 2 A 1 Rot A

  16. D × Abs + + A + 2 A 1 Rot A

  17. D × Abs + + A + 2 A 1 A (rot)

  18. C B D Abs + × 1 + A (rot) Abs + 2 A D + A + × 2 A 1 A (rot) C + A B

  19. C Combine Operations Abs + B 2 A D + × 1 A (rot) C + A B

  20. C Combine Operations |+| A 2 B D + × 1 A (rot) C + A B

  21. Scratchpad Memory “Virtual Vector Register File”

  22. “Virtual Vector Register File”

  23. “Virtual Vector Register File” Number of vector registers = ? Vector register size = ?

  24. “Virtual Vector Register File” Number of vector registers = ? Vector register size = ?

  25. C Evaluation Order B + 2 A (rot) + 5 2 D 1 A (rot) 1 3 1 3 1 4 2 3 × 0 0 1 2 1 2 1 1 C + 1 1 2 1 A B

  26. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  27. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  28. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  29. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  30. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  31. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  32. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  33. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  34. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  35. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  36. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  37. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  38. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  39. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  40. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  41. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  42. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  43. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  44. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  45. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

  46. C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

More Related