1 / 18

A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS

A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS. Using Distributed Arithmetic and Residue Number System. Project Scope.

halle
Download Presentation

A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System

  2. Project Scope • To compare the implementation efficiencies (area times delay) of Distributed Arithmetic (DA), RNS and DA-RNS based parallel multiply accumulate architectures on FPGAs

  3. Background and Context • FPGAs increasingly used for DSP computations • FPGAs have potential for parallelism • FPGAs architecture exploitation (LUT based) • Novel MAC architectures especially suitable for FPGAs

  4. Some More Background • In DSP MACs use constant coefficient (Fixed Multiplicand) • Full Multiplier Implementation Not Required • Not All Multiplier Architecture Efficient for FPGAs

  5. Motivation • Distributed Arithmetic and Residue Arithmetic techniques are LUT based techniques • Explore the “synergy” between FPGA architecture and above mentioned techniques

  6. Distributed Arithmetic Overview

  7. Basic Serial Architecture

  8. Residue Arithmetic Overview • (z1, z2, ..., zn) = (x1, x2, …, xn)  (y1 ,y2, …, yn) • zi = (xi yi) mod mi •  denotes any of the modulo operations of addition, subtraction or multiplication

  9. Modulo Adder

  10. Modulo Constant Multiplier • Due to the small sizes of residues and a constant multiplicand, a direct LUT based implementation is very efficient 4-bit Constant Modulo Multiplier 5-bit Constant Modulo Multiplier A0 A0 A1 X[3:0] A1 X[4:0] A2 A2 A3 A3 A4

  11. RNS MAC Architecture

  12. Conversion Issues in RNS • Binary to RNS and RNS to Binary Conversion are significant overheads • Binary to RNS relatively simple • RNS to Binary Using a Direct CRT Implementation Requires Modulo M adders

  13. Forward Conversion

  14. Reverse Conversion

  15. DA-RNS Coupling

  16. Scaling Accumulator Design

  17. DA 8-bits 8 Taps 12-bits Coefficients Implementation

  18. Critical Path Results Source: PSC8_0_PSC_0/I_Q7 (FF) Destination: SACC24_REG2/I_Q3 (FF) Data Path: PSC8_0_PSC_0/I_Q7 to SACC24_REG2/I_Q3e)

More Related