1 / 22

REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs

REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs. Caiwen Ding 2 , Shuo Wang 1 , Ning Liu 2 , Kaidi Xu 2 , Yanzhi Wang 2 , and Yun Liang 1 1 CECA, Peking University, China 2 Northeastern University, USA. FPGA Accelerated DNNs.

daphneb
Download Presentation

REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs Caiwen Ding2, Shuo Wang1, Ning Liu2, Kaidi Xu2, Yanzhi Wang2, and Yun Liang1 1CECA, Peking University, China 2Northeastern University, USA

  2. FPGA Accelerated DNNs

  3. YOLO based Object Detection

  4. Heterogeneous Resources Logic Blocks DSP Blocks Block RAMs YOLO Model Size (32MB) YOLO Model for FPGAs • Large Model Size

  5. Partition Workload CSR Format Data Parameter Pruning Indices Sparse Matrix Hardware Unfriendly! • Unbalanced Workload • 0 : 2 : 1 : 1 = • Extra Storage Footprint • indices • Irregular Memory Access • random access is slow

  6. w00 w02 w00 w01 w03 1 x 4 Dense Vector w01 w02 w00 w03 w01 w02 w03 w01 w00 w02 6 x 9 Original Matrix w03 w02 w00 w01 w03 w30 w00 w03 w03 w33 w33 w00 w31 w01 w04 w34 w01 w35 w05 w02 w03 4 x 4 Original Matrix w32 w02 w04 w34 w05 w35 Structured Compress Circulant Projection Compress Structured Matrix w10 w11 w12 w13 w20 w21 w22 w23 • Circulant Matrix w30 w31 w32 w33 4 x 4 Circulant Matrix • Block-Circulant Matrix 2 x 9 Dense Matrix

  7. w00 w03 w03 y0 w30 w33 w33 y3 w04 w04 w34 w34 w01 w31 w05 w05 w02 w35 w35 w32 FFT ∑ x0 x0 y0 Fast Fourier Transformation Circulant Convolution Acceleration x1 x1 y1 y4 y1 FFT-Accelerated Circulant Convolution IFFT x2 x2 x3 x3 y2 y5 y2 x4 x4 ✖️ = FFT x5 x5 y3 x3 x3 y4 x4 x4 x5 x5 y5

  8. m x n Matrix m/k x n Dense Circulant Matrix w33 w03 w34 w04 w35 w05 Structured Compress w03 w00 w01 w04 w05 w02 Circulant Convolution Complexity Analysis w33 w30 w31 w34 w32 w35 Hardware Friendly! k x k Circulant Sub-Matrix • Storage Complexity • reduced from O(m·n) to O(m·n/k) • Computational Complexity • reduced from O(m·n) to O(m·n·logk/k)

  9. Power of Two ICCV’15 Tenary Bitwidth NIPS’16 Binary Bitwidth ECCV’16 Fixed Bitwidth ICLR’16 Equal Distance Quantization Techniques Overview Our Work: Req-YOLO FPGA’19 Non-Equal Distance Quantization Techniques

  10. YOLO Architecture Specification REQ-YOLO Framework Optimized FPGA Implementation Hardware Optimization Automatic Synthesis Toolchain ADMM based Training Structured Compression Mixed Distance Quantization FPGA-friendly Inference Acceleration

  11. Better Hardware Utilization Decent Accuracy High Accuracy Y Y 1000 101 100 Simple Multiplication (Shift) Complex Multiplication 011 0100 Low Accuracy 010 Data Quantization Approaches 0010 equal distances 001 0001 exponential distances • Equal Distance • Power of Two X X • We propose Mixed Distance quantization • combine equal + exponential • resource-aware

  12. sign primary shift 2 bits secondary Y Y shift 1 bits Simpler Hardware !! 1000 0100 addition • signed bit Mixed Distance Quantization 0010 1000 0100 0011 1 10 0001 • primary bits for coarse-grained offsets 0010 0001 • secondary bits for fined-grained offsets • Mixed Distance X X More Balanced! mixed distances exponential distances • Mixed Distance Encoding

  13. bottleneck • Equal Distance • Mixed Distance bottleneck mixed distance equal distance Resource-Aware Quantization mixed distance equal distance • Layer-by-Layer Resource-Aware Quantization

  14. Resource & Accuracy Aware Quantization

  15. ADMM based Training Framework Training Approaches rewrite • Alternating Direction Method of Multipliers • Decomposing into two subproblems • Consider the Optimization Problem

  16. ADMM for Weight Quantization • ADMM based Quantization for FFT based Acceleration • perform weight mapping in the weight domain • higher compression ratio and lower accuracy degradation

  17. FPGA Platforms Experimental Setup • YOLO Architecture • Tiny YOLO • Benchmark Suite • DJI benchmark (IoU) • Pascal (IoU) • Software Tools • SDAccel 2017.1

  18. Summary • Performance • Energy Efficiency Experimental Results • at least 7X higher throughput over GPU implementation • at least 3X higher energy efficiency over GPU implementation • at least 15X higher throughput over previous FPGA implementation • at least 4X higher energy efficiency over previous FPGA implementation FPGA Req-YOLO GPU

  19. Experimental Results • Resource Utilization Consistently improved utilizations across different FPGA resources

  20. Experimental Results • Accuracy Degradation Accuracy degradations are with 6%

  21. Conclusion • Resource and Accuracy Aware Quantization and • reduces both storage and computational complexity • resource utilization is improved • accuracy degradation is considered • YOLO Inference Engine Created by Req-YOLO • higher throughput speedup • higher energy speedup • < 6% accuracy degradation

  22. Thank you !

More Related