Combinational and Sequential Mapping with Priority Cuts

1 / 30

# Combinational and Sequential Mapping with Priority Cuts - PowerPoint PPT Presentation

Combinational and Sequential Mapping with Priority Cuts. Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley. Outline. Traditional cut-based LUT mapping Improved technology mapping with priority cuts Sequential mapping Other applications of priority cuts

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Combinational and Sequential Mapping with Priority Cuts' - morna

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Combinational and Sequential Mapping with Priority Cuts

Alan Mishchenko

Sungmin Cho

Satrajit Chatterjee

Robert Brayton

UC Berkeley

Outline
• Improved technology mapping with priority cuts
• Sequential mapping
• Other applications of priority cuts
• Experimental results
Technology Mapping

Input: A Boolean network (And-Inverter Graph)

Output: A netlist of K-LUTs implementing the Boolean network optimizing some cost function

f

f

Technology

Mapping

e

e

a

c

d

a

c

d

b

b

The subject graph

The mapped netlist

k-feasible Cuts

r

A cut of a node n is a set of nodes in transitive fan-in

such that

every path from the node to PIs is blocked by nodes in the cut.

A k-feasible cut means the size of the cut must be k or less.

p

q

a

b

c

The set {p, b, c} is a 3-feasible cut of node r. (It is also a 4-feasible cut.)

k-feasible cuts are important in FPGA mapping since the logic between a node and the nodes in its cut can be replaced by a k-LUT.

k-feasible Cut Computation

The set of cuts of a node is a ‘cross product’ of the sets of cuts of its children

{ {r},{p, q}, {p, b, c},{a, b, q}, {a, b, c} }

r

{ {p},{a, b} }

{ {q},{b, c} }

Computation is done bottom-up

p

q

{ {b} }

{ {c} }

{ {a} }

a

c

b

Any cut that is of size greater than k is discarded

(P. Pan et al, FPGA ’98; J. Cong et al, FPGA ’99)

Basic Mapping Algorithm

Depth-optimal LUT mapping of a DAG using all cuts at each node

Input: And-Inverter Graph

• Compute K-feasible cuts for each node
• Compute best arrival time at each node
• In topological order (from PI to PO)
• Compute the depth of all cuts and choose the best one
• Perform area recovery
• Using area flow
• Using exact local area
• Chose the best cover
• In reverse topological order (from PO to PI)

Output: Mapped Netlist

Area Recovery Summary
• Area recovery heuristics
• Area-flow (global view)
• Chooses cuts with better logic sharing
• Exact local area (local view)
• Minimizes the number of LUTs needed to map each node
• The results of area recovery depends on
• The order of processing nodes
• The order of applying two passes
• The number of iterations
• This scheme works for the constant-delay model
• Any change off the critical path doesn’t affect critical path
• For large designs, there may be many k-feasible cuts
• Order of millions
• Previous ways of dealing with the problem
• Detect and remove cut dominance
• Perform cut pruning
• Store only cuts on the frontier of mapping
Outline
• Improved technology mapping
• Sequential mapping
• Other applications of priority cuts
• Experimental results
New Mapping Algorithm

Near-depth-optimal LUT mapping of a DAG using several cuts at each node

Input: And-Inverter Graph

• Compute K-feasible cuts for each node
• Compute arrival time at each node
• In topological order (from PI to PO)
• Compute the depth of all cuts and choose the best one
• Compute at most C good cuts and choose the best one
• Perform area recovery
• Using area flow
• Using exact local area
• Re-compute at most C good cuts and choose the best one in each iteration
• Chose the best cover
• In reverse topological order (from PO to PI)

Output: Mapped Netlist

Computing Priority Cuts
• Consider nodes in a topological order
• At each node, merge two sets of fanin cuts (each containing C cuts) getting (C+1) * (C+1) + 1 cuts
• Sort these cuts using a given cost function, select C best cuts, and use them for computing priority cuts of the fanouts
• Select one best cut, and use it to map the node
• Sorting criteria
Discussion
• K - max cut size
• C - max number of cuts
• n - number of nodes
• m – number of edges
• Complexity analysis
• FlowMap O(Kmn) (J. Cong et al, TCAD ’94)
• CutMap O(2KmnK) (J. Cong et al, FPGA ’95)
• Proposed mapping algorithm
• O(KC2n)
Priority Cuts: A Bag of Tricks
• Compute and use priority cuts (a subset of all cuts)
• Dynamically update the cuts in each mapping pass
• Use different sorting criteria in each mapping pass
• Include the best cut from the previous pass into the set of candidate cuts of the current pass
• Consider several depth-oriented mappings to get a good starting point for area recovery
• Use complementary heuristics for area recovery
• Perform cut expansion as part of area recovery
• Use efficient memory management
Outline
• Improved technology mapping
• Sequential mapping
• Other applications of priority cuts
• Experimental results
Sequential Mapping
• That is, combinational mapping and retiming combined
• Minimizes clock period in the combined solution space
• Previous work:
• Pan et al, FPGA’98
• Our contribution: divide sequential mapping into steps
• Find the best clock period via sequential arrival time computation (Pan et al, FPGA’98)
• Run combinational mapping with the resulting arrival/required times of the register outputs/inputs
• Perform final retiming to bring the circuit to the best clock period computed in Step 1
Sequential Mapping (continued)
• Uses priority cuts (L=1) for computing sequential arrival times
• very fast
• Reuses efficient area recovery available in combinational mapping
• almost no degradation in LUT count and register count
• Greatly simplifies implementation
• due to not computing sequential cuts (cuts crossing register boundary)
• Quality of results
• Leads to quality that is better (by ~15%) than combinational mapping followed by retiming
• due to searching the combined search space
• Achieves almost the same (-1%) clock period as the general sequential mapping with sequential cuts
• due to using transparent register boundary without computing sequential cuts
Outline
• Improved technology mapping
• Sequential mapping
• Other applications of priority cuts
• Experimental results
Speeding Up SAT Solving
• Perform technology mapping into K-LUTs for area
• Define area as the number of CNF clauses needed to represent the Boolean function of the cut
• Run several iterations of area recovery
• Reduced the number of CNF clauses by ~50%
• Compared to a smart circuit-to-CNF translation (M. Velev)
• Improves SAT solver runtime by 3-10x
• Experimental results will be given later
Minimizing the Total Number of BDD Nodes Needed to Represent a Boolean Network
• Perform technology mapping into K-LUTs for minimizing area under delay constraints
• Define area of a cut as the number of BDD nodes needed to represent the Boolean function of the cut
• Run delay-oriented mapping, followed by several iterations of area recovery
Cut Sweeping
• Reduce the circuit by detecting and merging shallow equivalences (proposed by Niklas Een)
• By “shallow” equivalences, we mean equivalent points, A and B, for which there exists a K-cut C (K < 16) such that FA(C) = FB(C)
• A subset of “good” K-input priority cuts can be computed
• The quality of a cut is determined by the number of fanouts of the cut leaves
• The more fanouts, the more likely the cut is a common cut for two nodes
• Cut sweeping quickly reduces the circuit
• Typically ~50% gain of SAT sweeping (Fraiging)
• Cut sweeping is much faster than SAT sweeping
• Typically 10-100x, for large designs
• Can be used as a fast preprocessing to (or a low-cost substitute for) SAT sweeping
Sequential Resynthesis for Delay
• Restructure logic along the tightest sequential loops to reduce delay after retiming (Soviani/Edwards, TCAD’07)
• Similar to sequential mapping
• Computes seq arrival times for the circuit
• Uses the current logic structure, as well as logic structure, transformed using Shannon expansion w.r.t. the latest variables
• Accepts transforms leading to delay reduction
• In the end, retimes to the best clock period
• The improvement is 7-60% in delay with 1-12% area degradation (ISCAS circuits)
• This algorithm could benefit from the use of priority cuts
Outline
• Improved technology mapping
• Sequential mapping
• Other applications of priority cuts
• Experimental results
Experimental Comparison
• Compare the new mapping against the traditional mapping in terms of
• Delay
• Area
• Runtime
• Memory
• Compare on large industrial benchmarks with choices
• Analyze the performance of the new mapping for
• Large designs
• Large LUTs
• Explore the potential of sequential mapping
• Computer used for experiments
• IBM ThinkPad laptop with 1.6GHz and 2Gb RAM
Priority cuts vs. Cut enumeration (C=8)

Used a set of the large public benchmarks

Priority Cutsvs. Cut Enumeration (K=6, C = 16)

Cut enumeration

Priority cuts

Mapping w/o choices

Priority cuts

Cut enumeration

Mapping with choices

Used a set of large industrial benchmarks

Performance on Large Designs (C=1)

Using design wb_conmax.v (part of IWLS 2005 benchmarks)

This is a WISHBONE Interconnect Matrix IP core. It can interconnect up to 8 Masters and 16 Slaves

Source: http://www.opencores.org

Performance for Large LUTs (C=1)

Using 100 timeframes of design wb_conmax.v

Sequential Mapping (K=6, C=8)

Used a subset of ISCAS benchmarks, for which retiming reduced delay

Summary
• Cut computation
• Optimum-depth mapping
• Area recovery
• Presented an improved approach to mapping
• Computes a small number of cuts at each node
• Uses new ideas to dramatically reduce memory and runtime
• Reported experimental results
• Compared priority cuts with exhaustive cut enumeration
• Delay and area are comparable or better by 1-3%
• Memory and runtime are greatly reduced (5x for 6-LUTs)
• Showed performance on very large designs (2 sec to map 1M)
• Compared combinational and sequential mapping
• Implemented in ABC
• Google: “abc berkeley” (package “if”)