1 / 12

Reconfigurable Computing - Performance Issues

Reconfigurable Computing - Performance Issues. John Morris Chung-Ang University The University of Auckland. ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia. FPGA Architectures. Design Flow Good engineering practice requires that design exercises should follow a defined procedure

granville
Download Presentation

Reconfigurable Computing - Performance Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconfigurable Computing -Performance Issues John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia

  2. FPGA Architectures • Design Flow • Good engineering practice requires that design exercises should follow a defined procedure • User’s specification • This is your starting point • It may take several forms • Informal requirements given to you by your user / client / … • Formal written requirements • All functional and non-functional requirements are precisely stated • Sometimes resulting in a very large (and dull) document! • Something in between • Your tutorial assignment was in this category • Mostly formal, but with some gaps you would need to fill in • Using research / further discussion with client / … etc

  3. Typical FPGA Architecture • Logic blocks embedded in a ‘sea’ of connectionresources • CLB = logic blockIOB = I/O bufferPSM = programmable switch matrix • Interconnections critical • Transmission gates on paths • Flexibility • Connect any LB to any other • but • Much slower than connections within a logic block • Much slower than long lines on an ASIC • Aside: • This is a ‘universal’ problem - not restricted to FPGAs! • Applies to • • custom VLSI, • • ASICs, • • systems, • • parallel processors • Small transistors  high speed  high density  long, wide datapaths

  4. Logic Blocks • Combination of • And-or arrayorLook-Up-Table (LUT) • Flip-flops • Multiplexors • General aim • Arbitrary boolean function of several variables • Storage • Adders are critical • All modern FPGAs have‘fast carry logic’ • High speed lines connectingLBs directly • Very fast ripple carry adders

  5. an-1 a1 bn-1 b1 an-2 a0 bn-2 b0 FA FA FA FA cout cout cin cin cout cout cin cin sn-1 s1 sn-2 s0 carryout Ripple Carry Adder • The simplest and most well known adder • Time to complete • n x propagation delay( FA: (a or b)  carry ) • We can do better than this - using one of many known better structures • but • What are the advantages of a ripple carry adder? • Small • Regular • Fits easily into a 2-D layout! Very important in packing circuitry into fixed 2-D layout of an FPGA!

  6. an-1 a1 a3 bn-1 b3 b1 an-2 a2 a0 bn-2 b2 b0 FA FA FA FA FA FA cout cout cout cin cin cin cout cout cout cin cin cin sn-1 s1 s3 sn-2 s2 s0 carryout LB LB LB Ripple Carry Adders • Ripple carry adder performance is limited by propagation of carries But these signals would need to be carried by the generalrouting resources (slow!) (In fact, you can’t fit a 2-bit adder with carry out in a CLB because there aren’t enough outputs! A 2-bit adder fits in a Xilinx CLB (enough logic for 5 inputs and 2 outputs) The fast carry logic provides special (low R) lines for carry-in and carry-out fast adder with 2 bits/CLB

  7. ‘Fast Carry’ Logic • Critical delay • Transmission of carry out from one logic block to the next • Solution (most modern FPGAs) • ‘Fast carry’ logic • Special paths between logic blocks used specifically for carry out • Very fast ripple carry adders! • More sophisticated adders? • Carry select • Uses ripple carry blocks - so can use fast carry logic • Should be faster for wide datapaths? • Carry lookahead • Uses large amounts of logic and multiple logic blocks • Hard to make it faster for small adders!

  8. Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 ‘Standard’ n-bit ripple carry adders n = any suitable value 0 1 0 1 Here we build an 8-bit adder from 4-bit blocks carry sum4-7

  9. These two blocks ‘speculate’ on the value of cout3 This block adds the 4 low order bits After 4*tpd it will produce a carry out Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 One assumes it will be 0 the other assumes 1 0 1 0 1 carry sum4-7

  10. This block adds the 4 low order bits After 4*tpd it will produce a carry out Carry Select Adder • After 4*tpd we will have: • sum0-3 (final sum bits) • cout3 (from low order block) • sum04-7 • cout07 (from block assuming 0 cin) • sum14-7 • cout17 (from block assuming 1 cin) a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 0 1 0 1 carry sum4-7

  11. Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder Cout3 selects correct sum4-7 and carry out sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 0 1 0 1 All 8 bits + carry are available after 4*tpd(FA) + tpd(multiplexor) carry sum4-7

  12. Carry Select Adder • This scheme can be generalized to any number of bits • Select a suitable block size (eg 4, 8) • Replicate all blocks except the first • One with cin = 0 • One with cin = 1 • Use final cout from preceding block to select correct set of outputs for current block

More Related