A Survey of Logic Block Architectures

A Survey of Logic Block Architectures For Digital Signal Processing Applications

Presentation Outline • Considerations in Logic Block Design • Computation Requirements • Why Inefficiencies? • Representative Logic Block Architectures • Proposed • Commercial • Conclusions: What is suitable Where?

Why DSP??? The Context • Representative of computationally intensive class of applications  datapath oriented and arithmetic oriented • Increasingly large use of FPGAs for DSP  multimedia signal processing, communications, and much more • To study the “issues” in reconfigurable fabric design for compute intensive applications  What is involved in making a fabric to accelerate multimedia reconfigurable computing possible?

Elements of a Reconfigurable Architecture • Logic Block/Processing Element • Differing Grains Fine>>Coarse>>ALUs • Routing • Dynamic Reconfiguration

So what’s wrong with the typical FPGA? • Meant to be general purpose  lower risks • Toooo Flexible!  Result: Efficiency Gap • Higher Implementation Cost, Larger Delay, Larger Power Consumption than ASICs • Performance vs. Flexibility Tradeoff  Postponing Mapping and Silicon Re-use

Solution? See how FPGAs are Used? • FPGAs are being used for “classes” of applications  Encryption, DSP, Multimedia etc. • Here lies the Key  Design FPGAs for a class of applications • Application Domain Characterization  Application Domain Tuning

Domain Specialization COMPUTATION  defines  ARCHITECTURE • Target Application Characteristics known beforehand? Yes • Characterize the application domain • Determine a balance b/w flexibilty vs efficiency • Tune the architecture according

Categorizing the “Computation” • Control  Random Logic Implementation • Datapath  Processing of Multi-bit Data • Conflicting Requirements???

Datapath Element Requirements • Operates on Word Slices or Bit Slices • Produces multi-bit outputs • Requires many smaller elements to produce each bit output  i.e. multiple small LUTs

Control Logic Requirements • Produces a single output from many single bit inputs • Benefits from large grain LUT as logic levels gets reduced

Logic Block Design: Considerations • “How much” of “what kinds” of computations to support? • Tradeoff: Generality vs Specialization

How much of What? Applications benchmarking

So what do we have to support? • Datapath functionality, in particular arithmetic, is dominant in DSP. • The datapath functions have different bit-widths. • DSP designs heavily use multiplexers of various size. Thus, an efficient mapping of multiplexers should be supported. • DSP functions do contain random logic. The amount of random logic varies per design. • Some DSP designs use wide boolean functions.

DSP Building Blocks • Some techniques widely used to achieve area-speed efficient DSP implementations • Bit Serial Computations • Routing Efficient • Bit Level Pipelining Increases throughput even more • Digit Serial Computation • Combining “Area efficiency” of bit-serial and with “Time efficiency” of Bit-parallel

Classes of DSP-optimized FPGA Architectures • Architectures with Dedicated DSP Logic • Homogeneous • Hetrogeneous • Globally Homogeneous, Locally Heterogenous • Architectures of Coarser Granularity • With DSP Specific Improvements (e.g. Carry Chains, Input Sharing, CBS)

Some Representative Architectures

Bit-Serial FPGA with SR LUT • Bit-serial paradigm suites the existing FPGA so why not optimize the FPGA for it! • Logic block to support efficient implementation of bit-serial data path and bit-level pipelining • LUTs can be used for combinational logic as well as for Shift Registers

A Bit-Serial Adder A Bit-Serial Adder which processes two bits at a time Interface Block Diagram

A Bit-Serial Multiplier Cell

The Proposed Bit Serial Logic Block Architecture • 4x4-input LUTs and 6 flip-flops. • The two multiplexers in front of the LUTs are targeted mainly for carry-save operations which are frequently used in bit-serial computations. • There are 18 signal inputs and 6 signal outputs, plus a clock input. • Feed-back inputs c2, c3, c4, c5 can be connected to either GND or VDD or to one of the 4 outputs d0, d1, d2, d3. Therefore, each LUT can implement any 4-input functions controlled by inputs a0, a1, a2, a3 or b0, b1, b2, b3. • Programmable switches connected to inputs a4 and b4 control the functionality of the four multiplexers at the output of LUTs. As a result, 2 LUTs can implement any 5-input functions. • The final outputs d0, d1, d2, d3 can either be the direct outputs from the multiplexers or the outputs from flip-flops. All bit-serial operators use the outputs from flip-flops; therefore the attached programmable switches are actually unnecessary. They are only present in order to implement any other logic functions other than bit-serial datapath circuits. • Two flip-flops are added (inputs c0 and c1) to implement shift registers which are frequently used in bit-serial operations.

The Modified LUT Implementing a Shift Register

Performance Results

Digit-Serial Logic Block Architecture • Digit–Serial Architectures process one digit (N=4 bits) at a time • They offer area efficiency similar to bit-serial architectures and time-efficiency close to bit-parallel architectures • N=4 bits can serve as an optimal granularity for processing larger digit sizes (N=8,16 etc)

Digit-Serial Building Blocks A Digit-Serial Adder A Digit-Serial Unsigned Multiplier

Digit-Serial Building Blocks A Pipelined Digit-Serial Unsigned Multiplier For Y=8 bits

Digit-Serial Signed Multiplier Blocks First Stage Module Middle Stages Module Last Stage Module

Signed Digit-Serial Multiplier A Digit-Serial Signed Booth’s Pipelined Multiplier with Y=8

Proposed Digit-Serial Logic Block

Detailed Structure of Digit-Serial Logic Block

The Basic Logic Module (LM) Table of Functions Implemented The Structure of the LM

Examples of Implementations N=4 Unsigned Multiplier N=4 Signed Multiplier Two N=2 Multipliers Bit-Level Pipelined

Area Comparison with Xilinx 4000 Series

Mixed-Grain Logic Block Architecture • Exploits the adder inverting property • Efficiently implements both datapath and random logic in the same logic block design

Adder Inverting Property Full Adder and Equations Showing The Inverting Property An optimal structure derived from the property

LUT Bits Utilization in Datapath and Logic Modes

Structure of a Single Slice

Complete Logic Block

Modified ALU Like Functionality

Comparison Results

Comparison Results (Cont…)

Comparison Results (cont…)

Coarser ALU Like Architectures

CHESS Architecture

CHESS ALU Based Logic Block

Structure of a Switch Box

Computation Field Programmable Architecture • A Heterogeneous architecture with cluster of datapath logic blocks • Separate LUT Based Logic Blocks for supporting random logic mapping • Basic Logic Block called a Partial Adder Subtraction Multiplier (PASM) Module

PASM Logic Block of CFPA

Cluster of PASM Logic Blocks

A Survey of Logic Block Architectures

A Survey of Logic Block Architectures

Presentation Transcript

Concurrent Error Detection Architectures for Symmetric Block Ciphers

A Novel FPGA Logic Block for Improved Arithmetic Performance

BAN Logic A Logic of Authentication

Survey of multicore architectures

A Survey of the Research on Future internet architectures

Introduction to Logic and Conditional Block

A Logic of Diversity

Programmable Logic Device Architectures

A Logic Block Enabling Logic Configuration by Non-Experts in Sensor Networks

Fig.2: Carry chain delay line: (a) logic block diagram;

Lecture 26 Logic BIST Architectures

VLSI Implementation of Threshold Logic – A Comprehensive Survey

Programmable Logic Devices and Architectures: A Nano-Course

VLSI Implementations of Threshold Logic— A Comprehensive Survey

A Logic of Authentication

Application Domains for Fixed-Length Block Structured Architectures

Architectures and Programmable Logic Devices

Block A

Hardware Architectures of Secret-Key Block Ciphers and Hash Functions

BLOCK A

A Survey of Parallel Computer Architectures