1 / 34

VIPERS II: A Soft-core Vector Processor with Single-copy Data Scratchpad Memory

VIPERS II: A Soft-core Vector Processor with Single-copy Data Scratchpad Memory. Christopher Han-Yu Chou Supervisor: Dr. Guy Lemieux. Outline. Motivation New Pipeline Structure VIPERS II Architecture Results Conclusion. Motivation.

wilona
Download Presentation

VIPERS II: A Soft-core Vector Processor with Single-copy Data Scratchpad Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VIPERS II: A Soft-core Vector Processor with Single-copy Data Scratchpad Memory Christopher Han-Yu Chou Supervisor: Dr. Guy Lemieux

  2. Outline • Motivation • New Pipeline Structure • VIPERS II Architecture • Results • Conclusion

  3. Motivation • VIPERS soft vector processor provides scalable performance for data-parallel applications on FPGAs • Original VIPERS has a few shortcomings: • High latency for copying data from memory to register file • Duplicate copies of data in precious on-chip memory • Scalar core not pipelined, and has no debug-core

  4. Duplicate Copies of Data • VIPERS uses dual read-port vector register file • 2 identical copies of the register file • Plus an original copy of data in on-chip memory • These data duplicates are wasteful • Limited on-chip memory capacity • Today’s FPGA offers fast on-chip memories. Why not access the memory directly?

  5. Contribution • Use address registers and scratchpad memory to replace vector register file • Eliminate slow load/store operations • More efficient on-chip memory usage • Auto-increment/decrement and circular buffer features • Reduce need for loop unrolling • Lower loop overhead

  6. Outline Motivation New Pipeline Structure VIPERS II Architecture Results Conclusion

  7. New Pipeline Structure • Classic 5-stage pipeline • Swap the execution stage with the memory access stage

  8. The “data” register file is replaced by address registers and a scratchpad memory. Eliminates load/store when data set fits in scratchpad memory. Implementation

  9. VIPERS II ISA

  10. Outline Motivation New Pipeline Structure VIPERS II Architecture Results Conclusion

  11. VIPERS II Architecture

  12. Architectural Changes • Vector address registers • Vector scratchpad memory • Data alignment crossbar network (DACN) • Fracturable ALUs

  13. Vector Address Registers • Features auto post-increment, pre-decrement, and circular buffer modes • Reduce loop overheads • Require less address registers than data registers to implement an application

  14. Vector Address Register

  15. Vector Scratchpad Memory • Reduced load/store latencies with simpler memory interface • Operate at 2X clock

  16. Vector Scratchpad Memory • Efficient data storage • Flexible data set size restriction • e.g. Median filter benchmark with byte-size data:

  17. Data Alignment Crossbar Network • With vector lanes coupled directly to memory, input vectors must be aligned • For misaligned operands, vector move instruction (vmov) is used to move data into alignment

  18. Example

  19. Data Alignment Crossbar Network • Implemented with multistage switching network to trade off performance for area

  20. Data elements are stored in their natural length Fracturable ALUs are used to execute on operands with varying widths Fracturable ALUs

  21. Fracturable ALUs

  22. Fracturable ALUs • Increased processing power • 4-Lane VIPERS II operating on byte-size data is equivalent to having a 16 lanes

  23. Outline Motivation New Pipeline Structure VIPERS II Architecture Results Conclusion

  24. Resource Usage

  25. Simulated Performance

  26. Hardware Performance

  27. Future Work • Increase operating frequency • Implement strided and indexed moves • Implement DACN with Omega network • Alternative implementation of address register

  28. Related Works • VESPA (Rose, CASES08) and VIPERS (Lemieux, FPGA08) are two previous soft-core vector processors • VIPERS II uses vector scratchpad memory instead of register file • IBM’s CELL processor (Pham, ISSCC05) features SRAM scratchpad memory populated by DMA • VIPERS II does not require load/store operations • Register pointer architecture (Dally, DATE07) reduces need for loop unrolling by dynamically changing the register pointer • VIPERS II is the first vector processor to utilize this technique

  29. Conclusion • VIPERS II architecture provides many advantages: • Improve performance by eliminating slow load/store operations • Achieve unrolled performance without unrolling • Efficient usage of on-chip memory • Increased processing power when executing smaller operands

  30. Thank you

  31. Vector Scratchpad Memory • e.g. Largest median filter that can be realized given a 64kb memory budget

  32. Implementation

  33. Strided/Indexed Access • Strided/indexed loads are replaced by strided/indexed move operations. • Similar to ‘vmov’, strided move ‘vmovs’ simply moves scattered elements to contiguous locations in the memory. • e.g. vmovs vA1, vA0, vstride0;

  34. Permutation Requirement

More Related