1 / 35

# ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs - PowerPoint PPT Presentation

ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs. Overview. Logic synthesis LUT Clustering LUT capacity Chortle – example technology mapper Architecture-specific optimization. Boolean network.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs' - baby

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### ECE 697FReconfigurable ComputingLecture 5Technology Mapping: Packing Logic into LUTs

• Logic synthesis

• LUT Clustering

• LUT capacity

• Chortle – example technology mapper

• Architecture-specific optimization

• A Boolean network is the main representation of the logic functions for technology independent optimizations.

• Each node can be represented as sum-of-products (or product-of-sums).

• Provides multi-level structure, but functions in the network need not correspond to logic gates.

out1 = k2 + x2’

out2 = k3 + x1

k2 = x1’ x2 x4 + k1

k3 = k1 x4’

k1 = x2 + x3

primary inputs

x1

x2

x3

x4

Boolean network example

• Support: set of variables used by a function.

• Transitive fanout: all the primary outputs and intermediate variables of a function.

• Transitive fanin: all the primary inputs and intermediate variables used by a function. Transistive fanin determines a cone of logic.

cone

primary inputs

output

x2

1

don’t care

x1

0

1

1

x3

Partially-specified function

• Simplification.

• Changing the way a function is represented.

• Network restructuring.

• Delay restructuring.

• Optimizations that reduce the height of critical paths.

f1

f4

F

f4

f2

f3

f3

before

after

• Cover the function:

• Cost (number of inputs) doesn’t always increase with added functions:

• Cost metric for static gates is literal:

• ax + bx’ has four literals, requires 8 transistors.

• Cost metric for FPGAs is logic element:

• All functions that fit in an LE have the same cost.

• Find the largest logic cone that will fit into the LUT:

r = q + s’

s = d’

q = g’ + h

d = a + b

C

A

C

B

D

B

D

How much fits in a LUT?

• One 2-input NAND gate frequently used for comparison.

• Approximately 12 ~ 15 gates per four-input LUT.

• 216 functions -> 80 after IO swapping

14 after IO inversion

• 4-input determined to be optimal

[Rose 1990]

• Improve circuit based on cost

• Keep same functionality

• Boolean Evaluation/decomposition

• Simple factoring -> minimizing literals

f = ac + ad + bc + bd

g = a + b + c

e = a + b g = e + c

f = e(c + d)

• Based on division:

• formulate candidate divisor;

• test how it divides into the function;

• if g = f/c, we can use c as an intermediate function for f.

• Algebraic division: don’t take into account Boolean simplification. Less expensive then Boolean division.

NAND2, cost 3

AOI-21, cost 4

Library-based Technology Mapping – MIS II

• Three steps: decomposition, matching, covering

• Circuit first decomposed into NAND representations

• Different collections of NANDs can be implemented differently in VLSI

Cost =

MIS II

• Decompose into NAND-2 using Boolean techniques

• Use dynamic programming to match subtrees with libraries

• Choose lowest cost implementation that covers all primitives.

• Minimize total number of LUTs

• Minimize the number of levels of LUTs

• Many different approaches

• Partitioning -> Flowmap

• BDDs -> XMAP

• Chortle -> Covering

• Basic Xilinx tech mapping follows Chortle with modification to handle registers.

M

J

K

G

H

I

D

E

F

A

B

C

x

w

y

z

Chortle-crf

• Dynamic programming approach

• Minimize # LUTs – primary goal

• Minimize # input circuit root uses

• Secondary goal

• Operates on AND-OR circuits.

Locate boundaries

2-LUTs

Without decomp

4-LUTs

Chortle-crf

• Major innovation is bin packing

• Simultaneously addresses decomposition and matching

• Goal: Find decomposition of every node in the network that minimizes # LUTs in final circuit

• Dynamically visit each node in the graph

• Fanin nodes drive the node under evaluation

Boxes -> fanin LUTs, cost is number of inputs

Bins -> N input LUT (in this case 5)

First Fit Decreasing /* construct 2-level decomp */

box list <- fanin LUTs sorted by size

bin list <- 0

while (box list is not 0) {

box <- largest LUT

find bin that will contain LUT

if bin doesn’t exist

bin <- box /* create new bin */

else

bin <- box /* pack in exisiting */

• Chain LUTs together

• Output of largest second level LUT connected to LUT with unused input

• May need to add a new LUT

• Leads to min LUTs and fanout LUT with smallest # input

• This fanout LUT used as input to next stage

u

v

x

y

w

u

v

x

z.2

y

z.1

v

u

w

x

y

z.1

Examples

a) Fanin LUTs

b) Two-level Decomposition

c) Multi-level Decomposition

• For LUTs with fewer than 6 inputs Chortle will create an optimal result for subtree

• Combination of sub-trees is not optimized.

• Local optimizations needed to ensure global optimality.

Reconvergent paths -> net drives multiple gates.

Replicating logic -> creating additional fanout

• Improve 2-level decomposition to take fanout into account

• Replace FFD with an exhaustive search that repeatedly invokes FFD.

• Try both with and without reconvergent path and select best mapping (forced merging)

• Inputs must reconverge at node being decomposed.

• Frequently, more than one pair of fan-in LUTs share inputs

• For each combination of pairs that share inputs, perform FFD.

• Two-level decomp with fewest bins and smallest least filled bin retained

Reconverge

pair list <- all pairs of fanin LUTs with shared inputs

best LUTs <- 0

for all possible pairs from pair list {

merged LUTs <- copy of fanin LUTs with forced merge

FFD(merged LUTs) /* best combo */

}

• Exhaustive search prohibitive

• Select box using following criteria

• Greatest # inputs

• Shares greatest # inputs with any existing bin

• Shares greatest # of inputs with existing (remaining) boxes

• Reduces to FFD for no input sharing

• Points 2 and 3 optimize network sharing

With Replication

Node Replication

• Apply replication to fanout nodes

• Map without replication first

• Locally decompose fanout nodes to determine savings

• Ordering important

• 20 netlists mapped to 5-input LUTs

• Reconvergence reduced LUTs by 2.7%

• Replication reduced LUTs by 3.7%

• Combined 14% reduction achieved

• Replication exposes reconvergent paths creating additional opportunities for optimization.

• Minimize delay through circuit

• Generally increases hardware required

• Reduced logic levels by 38%

• Increased # LUTs by 79%

• Note most delay in FPGA in interconnect

• MIS-PGA

• Groups inputs into LUTs

• Decompose into 4-LUTs (Roth-Karp)

• 47 times slower than Chortle

• 14% fewer LUTs

• XMAP

• Represent circuit as BDDs

• Effective for multiplexer based devices.

• Also, BDS-PGA

Flowmap

2. Determine point where minimum flow achieved for minimum cut

3. Cut until LUTs of size N achieved.

FF

Taking Flip flops into Account

• FPGA devices contain fixed resources – FFs

• Technology mapping should take these into account

• Consider fanout nodes.

• Seed BLE – choose BLE with most inputs.

• Select next BLE -> BLE which shares most inputs and outputs with cluster

• Continue until cluster is full or adding any BLE will overflow I -> # inputs

• Hill Climbing – exceed I limit temporarily to find better minimum.

• Many tech mapping algorithms exist to minimize delay/area

• Chortle use dynamic programming heuristic to perform mapping

• Largely a solved problem

• More sophisticated techniques evaluated recently