Ece 697f reconfigurable computing lecture 5 technology mapping packing logic into luts
Download
1 / 35

ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs. Overview. Logic synthesis LUT Clustering LUT capacity Chortle – example technology mapper Architecture-specific optimization. Boolean network.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs' - baby


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Ece 697f reconfigurable computing lecture 5 technology mapping packing logic into luts

ECE 697FReconfigurable ComputingLecture 5Technology Mapping: Packing Logic into LUTs


Overview
Overview

  • Logic synthesis

  • LUT Clustering

  • LUT capacity

  • Chortle – example technology mapper

  • Architecture-specific optimization


Boolean network
Boolean network

  • A Boolean network is the main representation of the logic functions for technology independent optimizations.

  • Each node can be represented as sum-of-products (or product-of-sums).

  • Provides multi-level structure, but functions in the network need not correspond to logic gates.


Boolean network example

primary outputs

out1 = k2 + x2’

out2 = k3 + x1

k2 = x1’ x2 x4 + k1

k3 = k1 x4’

k1 = x2 + x3

primary inputs

x1

x2

x3

x4

Boolean network example


Terms
Terms

  • Support: set of variables used by a function.

  • Transitive fanout: all the primary outputs and intermediate variables of a function.

  • Transitive fanin: all the primary inputs and intermediate variables used by a function. Transistive fanin determines a cone of logic.

cone

primary inputs

output


Partially specified function

x2

1

don’t care

x1

0

1

1

x3

Partially-specified function


Optimizations
Optimizations

  • Simplification.

    • Changing the way a function is represented.

  • Network restructuring.

    • Adding and removing nodes.

  • Delay restructuring.

    • Optimizations that reduce the height of critical paths.


Partial collapsing
Partial collapsing

f1

f4

F

f4

f2

f3

f3

before

after


Technology mapping
Technology mapping

  • Cover the function:


Fpga tech mapping
FPGA tech mapping

  • Cost (number of inputs) doesn’t always increase with added functions:


Fpgas vs custom logic
FPGAs vs. custom logic

  • Cost metric for static gates is literal:

    • ax + bx’ has four literals, requires 8 transistors.

  • Cost metric for FPGAs is logic element:

    • All functions that fit in an LE have the same cost.


Lut based logic synthesis
LUT-based logic synthesis

  • Find the largest logic cone that will fit into the LUT:

r = q + s’

s = d’

q = g’ + h

d = a + b


How much fits in a lut

A

C

A

C

B

D

B

D

How much fits in a LUT?

  • One 2-input NAND gate frequently used for comparison.

  • Approximately 12 ~ 15 gates per four-input LUT.

  • 216 functions -> 80 after IO swapping

    14 after IO inversion

  • 4-input determined to be optimal

    [Rose 1990]


Technology independent logic optimization
Technology-Independent Logic Optimization

  • Improve circuit based on cost

    • Keep same functionality

  • Boolean Evaluation/decomposition

  • Simple factoring -> minimizing literals

    f = ac + ad + bc + bd

    g = a + b + c

    e = a + b g = e + c

    f = e(c + d)


Factorization
Factorization

  • Based on division:

    • formulate candidate divisor;

    • test how it divides into the function;

    • if g = f/c, we can use c as an intermediate function for f.

  • Algebraic division: don’t take into account Boolean simplification. Less expensive then Boolean division.


Library based technology mapping mis ii

Inv, cost 2

NAND2, cost 3

AOI-21, cost 4

Library-based Technology Mapping – MIS II

  • Three steps: decomposition, matching, covering

  • Circuit first decomposed into NAND representations

  • Different collections of NANDs can be implemented differently in VLSI


Mis ii

Cost =

Cost =

MIS II

  • Decompose into NAND-2 using Boolean techniques

  • Use dynamic programming to match subtrees with libraries

  • Choose lowest cost implementation that covers all primitives.


Tech mapping for luts
Tech Mapping for LUTs

  • Minimize total number of LUTs

  • Minimize the number of levels of LUTs

  • Many different approaches

    • Partitioning -> Flowmap

    • BDDs -> XMAP

    • Chortle -> Covering

  • Basic Xilinx tech mapping follows Chortle with modification to handle registers.


Chortle crf

L

M

J

K

G

H

I

D

E

F

A

B

C

x

w

y

z

Chortle-crf

  • Dynamic programming approach

  • Minimize # LUTs – primary goal

  • Minimize # input circuit root uses

    • Secondary goal

  • Operates on AND-OR circuits.

Locate boundaries


Chortle crf1

With decomposition

2-LUTs

Without decomp

4-LUTs

Chortle-crf

  • Major innovation is bin packing

  • Simultaneously addresses decomposition and matching

  • Goal: Find decomposition of every node in the network that minimizes # LUTs in final circuit


Mapping each tree
Mapping Each Tree

  • Dynamically visit each node in the graph

    • Fanin nodes drive the node under evaluation

      Boxes -> fanin LUTs, cost is number of inputs

      Bins -> N input LUT (in this case 5)

      First Fit Decreasing /* construct 2-level decomp */

      box list <- fanin LUTs sorted by size

      bin list <- 0

      while (box list is not 0) {

      box <- largest LUT

      find bin that will contain LUT

      if bin doesn’t exist

      bin <- box /* create new bin */

      else

      bin <- box /* pack in exisiting */


Multi level decomposition
Multi-Level Decomposition

  • Chain LUTs together

  • Output of largest second level LUT connected to LUT with unused input

  • May need to add a new LUT

  • Leads to min LUTs and fanout LUT with smallest # input

  • This fanout LUT used as input to next stage


Examples

w

u

v

x

y

w

u

v

x

z.2

y

z.1

v

u

w

x

y

z.1

Examples

a) Fanin LUTs

b) Two-level Decomposition

c) Multi-level Decomposition


Optimality
Optimality

  • For LUTs with fewer than 6 inputs Chortle will create an optimal result for subtree

  • Combination of sub-trees is not optimized.

  • Local optimizations needed to ensure global optimality.

    Reconvergent paths -> net drives multiple gates.

    Replicating logic -> creating additional fanout


Translating a design to an fpga
Translating a Design to an FPGA

  • Improve 2-level decomposition to take fanout into account

  • Replace FFD with an exhaustive search that repeatedly invokes FFD.

  • Try both with and without reconvergent path and select best mapping (forced merging)

  • Inputs must reconverge at node being decomposed.


Reconvergent paths
Reconvergent Paths

  • Frequently, more than one pair of fan-in LUTs share inputs

  • For each combination of pairs that share inputs, perform FFD.

  • Two-level decomp with fewest bins and smallest least filled bin retained

    Reconverge

    pair list <- all pairs of fanin LUTs with shared inputs

    best LUTs <- 0

    for all possible pairs from pair list {

    merged LUTs <- copy of fanin LUTs with forced merge

    FFD(merged LUTs) /* best combo */

    }


Maximum share decreasing
Maximum Share Decreasing

  • Exhaustive search prohibitive

  • Select box using following criteria

    • Greatest # inputs

    • Shares greatest # inputs with any existing bin

    • Shares greatest # of inputs with existing (remaining) boxes

  • Reduces to FFD for no input sharing

  • Points 2 and 3 optimize network sharing


Node replication

Without Replication

With Replication

Node Replication

  • Apply replication to fanout nodes

  • Map without replication first

  • Locally decompose fanout nodes to determine savings

  • Ordering important


Results chortle crf
Results – Chortle-crf

  • 20 netlists mapped to 5-input LUTs

  • Reconvergence reduced LUTs by 2.7%

  • Replication reduced LUTs by 3.7%

  • Combined 14% reduction achieved

  • Replication exposes reconvergent paths creating additional opportunities for optimization.


Chortle d
Chortle-d

  • Minimize delay through circuit

  • Generally increases hardware required

    • Reduced logic levels by 38%

    • Increased # LUTs by 79%

  • Note most delay in FPGA in interconnect


Other approaches
Other Approaches

  • MIS-PGA

    • Groups inputs into LUTs

    • Decompose into 4-LUTs (Roth-Karp)

    • 47 times slower than Chortle

    • 14% fewer LUTs

  • XMAP

    • Represent circuit as BDDs

    • Effective for multiplexer based devices.

    • Also, BDS-PGA


Flowmap

1. Use network flow to partition circuit.

Flowmap

2. Determine point where minimum flow achieved for minimum cut

3. Cut until LUTs of size N achieved.


Taking flip flops into account

FF

FF

Taking Flip flops into Account

  • FPGA devices contain fixed resources – FFs

  • Technology mapping should take these into account

  • Consider fanout nodes.


Lut packing vpack
LUT Packing - VPACK

  • Seed BLE – choose BLE with most inputs.

  • Select next BLE -> BLE which shares most inputs and outputs with cluster

  • Continue until cluster is full or adding any BLE will overflow I -> # inputs

  • Hill Climbing – exceed I limit temporarily to find better minimum.


Summary
Summary

  • Many tech mapping algorithms exist to minimize delay/area

  • Chortle use dynamic programming heuristic to perform mapping

  • Largely a solved problem

  • More sophisticated techniques evaluated recently


ad