A survey of logic block architectures
This presentation is the property of its rightful owner.
Sponsored Links
1 / 73

A Survey of Logic Block Architectures PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

A Survey of Logic Block Architectures. For Digital Signal Processing Applications. Presentation Outline. Considerations in Logic Block Design Computation Requirements Why Inefficiencies? Representative Logic Block Architectures Proposed Commercial Conclusions: What is suitable Where?.

Download Presentation

A Survey of Logic Block Architectures

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A survey of logic block architectures

A Survey of Logic Block Architectures

For Digital Signal Processing Applications


Presentation outline

Presentation Outline

  • Considerations in Logic Block Design

    • Computation Requirements

    • Why Inefficiencies?

  • Representative Logic Block Architectures

    • Proposed

    • Commercial

  • Conclusions: What is suitable Where?


Why dsp the context

Why DSP??? The Context

  • Representative of computationally intensive class of applications  datapath oriented and arithmetic oriented

  • Increasingly large use of FPGAs for DSP  multimedia signal processing, communications, and much more

  • To study the “issues” in reconfigurable fabric design for compute intensive applications  What is involved in making a fabric to accelerate multimedia reconfigurable computing possible?


Elements of a reconfigurable architecture

Elements of a Reconfigurable Architecture

  • Logic Block/Processing Element

    • Differing Grains Fine>>Coarse>>ALUs

  • Routing

  • Dynamic Reconfiguration


So what s wrong with the typical fpga

So what’s wrong with the typical FPGA?

  • Meant to be general purpose  lower risks

  • Toooo Flexible!  Result: Efficiency Gap

  • Higher Implementation Cost, Larger Delay, Larger Power Consumption than ASICs

  • Performance vs. Flexibility Tradeoff  Postponing Mapping and Silicon Re-use


Solution see how fpgas are used

Solution? See how FPGAs are Used?

  • FPGAs are being used for “classes” of applications  Encryption, DSP, Multimedia etc.

  • Here lies the Key  Design FPGAs for a class of applications

  • Application Domain Characterization  Application Domain Tuning


Domain specialization

Domain Specialization

COMPUTATION  defines  ARCHITECTURE

  • Target Application Characteristics known beforehand? Yes

  • Characterize the application domain

  • Determine a balance b/w flexibilty vs efficiency

  • Tune the architecture according


Categorizing the computation

Categorizing the “Computation”

  • Control  Random Logic Implementation

  • Datapath  Processing of Multi-bit Data

  • Conflicting Requirements???


Datapath element requirements

Datapath Element Requirements

  • Operates on Word Slices or Bit Slices

  • Produces multi-bit outputs

  • Requires many smaller elements to produce each bit output  i.e. multiple small LUTs


Control logic requirements

Control Logic Requirements

  • Produces a single output from many single bit inputs

  • Benefits from large grain LUT as logic levels gets reduced


Logic block design considerations

Logic Block Design: Considerations

  • “How much” of “what kinds” of computations to support?

  • Tradeoff: Generality vs Specialization


How much of what applications benchmarking

How much of What? Applications benchmarking


So what do we have to support

So what do we have to support?

  • Datapath functionality, in particular arithmetic, is dominant in DSP.

  • The datapath functions have different bit-widths.

  • DSP designs heavily use multiplexers of various size. Thus, an efficient mapping of multiplexers should be supported.

  • DSP functions do contain random logic. The amount of random logic varies per design.

  • Some DSP designs use wide boolean functions.


Dsp building blocks

DSP Building Blocks

  • Some techniques widely used to achieve area-speed efficient DSP implementations

  • Bit Serial Computations

    • Routing Efficient

    • Bit Level Pipelining Increases throughput even more

  • Digit Serial Computation

    • Combining “Area efficiency” of bit-serial and with “Time efficiency” of Bit-parallel


Classes of dsp optimized fpga architectures

Classes of DSP-optimized FPGA Architectures

  • Architectures with Dedicated DSP Logic

    • Homogeneous

    • Hetrogeneous

    • Globally Homogeneous, Locally Heterogenous

  • Architectures of Coarser Granularity

  • With DSP Specific Improvements (e.g. Carry Chains, Input Sharing, CBS)


Some representative architectures

Some Representative Architectures


Bit serial fpga with sr lut

Bit-Serial FPGA with SR LUT

  • Bit-serial paradigm suites the existing FPGA so why not optimize the FPGA for it!

  • Logic block to support efficient implementation of bit-serial data path and bit-level pipelining

  • LUTs can be used for combinational logic as well as for Shift Registers


A bit serial adder

A Bit-Serial Adder

A Bit-Serial Adder which processes two bits at a time

Interface Block Diagram


A bit serial multiplier cell

A Bit-Serial Multiplier Cell


The proposed bit serial logic block architecture

The Proposed Bit Serial Logic Block Architecture

  • 4x4-input LUTs and 6 flip-flops.

  • The two multiplexers in front of the LUTs are targeted mainly for carry-save operations which are frequently used in bit-serial computations.

  • There are 18 signal inputs and 6 signal outputs, plus a clock input.

  • Feed-back inputs c2, c3, c4, c5 can be connected to either GND or VDD or to one of the 4 outputs d0, d1, d2, d3. Therefore, each LUT can implement any 4-input functions controlled by inputs a0, a1, a2, a3 or b0, b1, b2, b3.

  • Programmable switches connected to inputs a4 and b4 control the functionality of the four multiplexers at the output of LUTs. As a result, 2 LUTs can implement any 5-input functions.

  • The final outputs d0, d1, d2, d3 can either be the direct outputs from the multiplexers or the outputs from flip-flops. All bit-serial operators use the outputs from flip-flops; therefore the attached programmable switches are actually unnecessary. They are only present in order to implement any other logic functions other than bit-serial datapath circuits.

  • Two flip-flops are added (inputs c0 and c1) to implement shift registers which are frequently used in bit-serial operations.


The modified lut implementing a shift register

The Modified LUT Implementing a Shift Register


Performance results

Performance Results


Digit serial logic block architecture

Digit-Serial Logic Block Architecture

  • Digit–Serial Architectures process one digit (N=4 bits) at a time

  • They offer area efficiency similar to bit-serial architectures and time-efficiency close to bit-parallel architectures

  • N=4 bits can serve as an optimal granularity for processing larger digit sizes (N=8,16 etc)


Digit serial building blocks

Digit-Serial Building Blocks

A Digit-Serial Adder

A Digit-Serial Unsigned Multiplier


Digit serial building blocks1

Digit-Serial Building Blocks

A Pipelined Digit-Serial Unsigned Multiplier For Y=8 bits


Digit serial signed multiplier blocks

Digit-Serial Signed Multiplier Blocks

First Stage Module

Middle Stages Module

Last Stage Module


Signed digit serial multiplier

Signed Digit-Serial Multiplier

A Digit-Serial Signed Booth’s Pipelined Multiplier with Y=8


Proposed digit serial logic block

Proposed Digit-Serial Logic Block


Detailed structure of digit serial logic block

Detailed Structure of Digit-Serial Logic Block


The basic logic module lm

The Basic Logic Module (LM)

Table of Functions Implemented

The Structure of the LM


Examples of implementations

Examples of Implementations

N=4 Unsigned

Multiplier

N=4 Signed

Multiplier

Two N=2

Multipliers

Bit-Level

Pipelined


Area comparison with xilinx 4000 series

Area Comparison with Xilinx 4000 Series


Mixed grain logic block architecture

Mixed-Grain Logic Block Architecture

  • Exploits the adder inverting property

  • Efficiently implements both datapath and random logic in the same logic block design


Adder inverting property

Adder Inverting Property

Full Adder and Equations Showing

The Inverting Property

An optimal structure derived from

the property


Lut bits utilization in datapath and logic modes

LUT Bits Utilization in Datapath and Logic Modes


Structure of a single slice

Structure of a Single Slice


Complete logic block

Complete Logic Block


Modified alu like functionality

Modified ALU Like Functionality


Comparison results

Comparison Results


Comparison results cont

Comparison Results (Cont…)


Comparison results cont1

Comparison Results (cont…)


Coarser alu like architectures

Coarser ALU Like Architectures


Chess architecture

CHESS Architecture


Chess alu based logic block

CHESS ALU Based Logic Block


Structure of a switch box

Structure of a Switch Box


Comparison results1

Comparison Results


Computation field programmable architecture

Computation Field Programmable Architecture

  • A Heterogeneous architecture with cluster of datapath logic blocks

  • Separate LUT Based Logic Blocks for supporting random logic mapping

  • Basic Logic Block called a Partial Adder Subtraction Multiplier (PASM) Module


Pasm logic block of cfpa

PASM Logic Block of CFPA


Cluster of pasm logic blocks

Cluster of PASM Logic Blocks


Comparison results2

Comparison Results


Some industry architectures designs

Some Industry Architectures Designs


Altera apex ii logic element

Altera APEX II Logic Element


Altera max ii logic element

Altera MAX II Logic Element


Le configuration in arithmetic mode

LE Configuration in Arithmetic Mode


Le in random logic implementation

LE in Random Logic Implementation


Altera stratix logic element

Altera Stratix Logic Element


Altera stratix ii architecture

Altera Stratix II Architecture


Stratix ii adaptive logic module

Stratix II Adaptive Logic Module


Stratix ii alm in arithmetic mode

Stratix II ALM in Arithmetic Mode


Various configurations in an alm of stratix ii

Various Configurations in an ALM of Stratix II


Multiplier resources in stratix ii

Multiplier Resources in Stratix II


Structure of a dsp block in stratix ii

Structure of a DSP Block in Stratix II


Xilinx virtex ii pro architecture

XILINX Virtex II Pro Architecture


Basic logic element of virtex ii pro

Basic Logic Element of Virtex II Pro


Dedicated multipliers in virtex ii pro

Dedicated Multipliers in Virtex II Pro


Processor programmable logic coupled architecture

Processor-Programmable Logic Coupled Architecture


Picoga architecture coupled with a vliw processor

PiCoGA Architecture Coupled with a VLIW processor


Picoga logic block

PiCoGA Logic Block


Conclusions

Conclusions

  • Traditional general purpose FPGA inefficient for data path mapping

  • Logic blocks with DSP specific enhancements seem a promising solution

  • Coarse Grained Logic can achieve better application mapping for data path but sacrifice flexibility

  • Dedicated Blocks (Multipliers) increase performance but also increases cost significantly


Conclusions1

Conclusions

  • PDSPs with embedded FPGA can achieve a good balance between performance and power consumption

  • So…Which approach is the best?  No single best exists


Suitability of approaches

Suitability of Approaches

  • Highly computationally intensive applications with large amounts of parallelism can use platform FPGAs where often large resources are required and power consumption is not an issue.

  • Here cost/function will be lowest


Suitability of approaches1

Suitability of Approaches

  • Field Programmable Logic based coprocessors can benefit from coarse grained blocks where most control functions are implemented by the PDSP itself


Suitability of approaches2

Suitability of Approaches

  • Higher flexibility and lower cost can be achieved with logic blocks with DSP specific enhancements but flexibility to implement control logic in an efficient manner.


  • Login