Loading in 2 Seconds...

Combinational and Sequential Mapping with Priority Cuts

Loading in 2 Seconds...

- By
**morna** - Follow User

- 86 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Combinational and Sequential Mapping with Priority Cuts' - morna

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Combinational and Sequential Mapping with Priority Cuts

Outline

Alan Mishchenko

Sungmin Cho

Satrajit Chatterjee

Robert Brayton

UC Berkeley

Outline

- Traditional cut-based LUT mapping
- Improved technology mapping with priority cuts
- Sequential mapping
- Other applications of priority cuts
- Experimental results

Technology Mapping

Input: A Boolean network (And-Inverter Graph)

Output: A netlist of K-LUTs implementing the Boolean network optimizing some cost function

f

f

Technology

Mapping

e

e

a

c

d

a

c

d

b

b

The subject graph

The mapped netlist

k-feasible Cuts

r

A cut of a node n is a set of nodes in transitive fan-in

such that

every path from the node to PIs is blocked by nodes in the cut.

A k-feasible cut means the size of the cut must be k or less.

p

q

a

b

c

The set {p, b, c} is a 3-feasible cut of node r. (It is also a 4-feasible cut.)

k-feasible cuts are important in FPGA mapping since the logic between a node and the nodes in its cut can be replaced by a k-LUT.

k-feasible Cut Computation

The set of cuts of a node is a ‘cross product’ of the sets of cuts of its children

{ {r},{p, q}, {p, b, c},{a, b, q}, {a, b, c} }

r

{ {p},{a, b} }

{ {q},{b, c} }

Computation is done bottom-up

p

q

{ {b} }

{ {c} }

{ {a} }

a

c

b

Any cut that is of size greater than k is discarded

(P. Pan et al, FPGA ’98; J. Cong et al, FPGA ’99)

Basic Mapping Algorithm

Depth-optimal LUT mapping of a DAG using all cuts at each node

Input: And-Inverter Graph

- Compute K-feasible cuts for each node
- Compute best arrival time at each node
- In topological order (from PI to PO)
- Compute the depth of all cuts and choose the best one
- Perform area recovery
- Using area flow
- Using exact local area
- Chose the best cover
- In reverse topological order (from PO to PI)

Output: Mapped Netlist

Area Recovery Summary

- Area recovery heuristics
- Area-flow (global view)
- Chooses cuts with better logic sharing
- Exact local area (local view)
- Minimizes the number of LUTs needed to map each node
- The results of area recovery depends on
- The order of processing nodes
- The order of applying two passes
- The number of iterations
- This scheme works for the constant-delay model
- Any change off the critical path doesn’t affect critical path

Drawbacks of Traditional Mapping Based on Exhaustive Cut Enumeration

- For large designs, there may be many k-feasible cuts
- Order of millions
- Previous ways of dealing with the problem
- Detect and remove cut dominance
- Perform cut pruning
- Store only cuts on the frontier of mapping

Outline

- Traditionalcut-based technology mapping
- Improved technology mapping
- Sequential mapping
- Other applications of priority cuts
- Experimental results

New Mapping Algorithm

Near-depth-optimal LUT mapping of a DAG using several cuts at each node

Input: And-Inverter Graph

- Compute K-feasible cuts for each node
- Compute arrival time at each node
- In topological order (from PI to PO)
- Compute the depth of all cuts and choose the best one
- Compute at most C good cuts and choose the best one
- Perform area recovery
- Using area flow
- Using exact local area
- Re-compute at most C good cuts and choose the best one in each iteration
- Chose the best cover
- In reverse topological order (from PO to PI)

Output: Mapped Netlist

Computing Priority Cuts

- Consider nodes in a topological order
- At each node, merge two sets of fanin cuts (each containing C cuts) getting (C+1) * (C+1) + 1 cuts
- Sort these cuts using a given cost function, select C best cuts, and use them for computing priority cuts of the fanouts
- Select one best cut, and use it to map the node
- Sorting criteria

Discussion

- K - max cut size
- C - max number of cuts
- n - number of nodes
- m – number of edges

- Complexity analysis
- Traditional mapping algorithm
- FlowMap O(Kmn) (J. Cong et al, TCAD ’94)
- CutMap O(2KmnK) (J. Cong et al, FPGA ’95)
- Proposed mapping algorithm
- O(KC2n)

Priority Cuts: A Bag of Tricks

- Compute and use priority cuts (a subset of all cuts)
- Dynamically update the cuts in each mapping pass
- Use different sorting criteria in each mapping pass
- Include the best cut from the previous pass into the set of candidate cuts of the current pass
- Consider several depth-oriented mappings to get a good starting point for area recovery
- Use complementary heuristics for area recovery
- Perform cut expansion as part of area recovery
- Use efficient memory management

Outline

- Traditionalcut-based technology mapping
- Improved technology mapping
- Sequential mapping
- Other applications of priority cuts
- Experimental results

Sequential Mapping

- That is, combinational mapping and retiming combined
- Minimizes clock period in the combined solution space
- Previous work:
- Pan et al, FPGA’98
- Cong et al, TCAD’98
- Our contribution: divide sequential mapping into steps
- Find the best clock period via sequential arrival time computation (Pan et al, FPGA’98)
- Run combinational mapping with the resulting arrival/required times of the register outputs/inputs
- Perform final retiming to bring the circuit to the best clock period computed in Step 1

Sequential Mapping (continued)

- Advantages
- Uses priority cuts (L=1) for computing sequential arrival times
- very fast
- Reuses efficient area recovery available in combinational mapping
- almost no degradation in LUT count and register count
- Greatly simplifies implementation
- due to not computing sequential cuts (cuts crossing register boundary)
- Quality of results
- Leads to quality that is better (by ~15%) than combinational mapping followed by retiming
- due to searching the combined search space
- Achieves almost the same (-1%) clock period as the general sequential mapping with sequential cuts
- due to using transparent register boundary without computing sequential cuts

Outline

- Traditionalcut-based technology mapping
- Improved technology mapping
- Sequential mapping
- Other applications of priority cuts
- Experimental results

Speeding Up SAT Solving

- Perform technology mapping into K-LUTs for area
- Define area as the number of CNF clauses needed to represent the Boolean function of the cut
- Run several iterations of area recovery
- Reduced the number of CNF clauses by ~50%
- Compared to a smart circuit-to-CNF translation (M. Velev)
- Improves SAT solver runtime by 3-10x
- Experimental results will be given later

Minimizing the Total Number of BDD Nodes Needed to Represent a Boolean Network

- Perform technology mapping into K-LUTs for minimizing area under delay constraints
- Define area of a cut as the number of BDD nodes needed to represent the Boolean function of the cut
- Run delay-oriented mapping, followed by several iterations of area recovery

Cut Sweeping

- Reduce the circuit by detecting and merging shallow equivalences (proposed by Niklas Een)
- By “shallow” equivalences, we mean equivalent points, A and B, for which there exists a K-cut C (K < 16) such that FA(C) = FB(C)
- A subset of “good” K-input priority cuts can be computed
- The quality of a cut is determined by the number of fanouts of the cut leaves
- The more fanouts, the more likely the cut is a common cut for two nodes
- Cut sweeping quickly reduces the circuit
- Typically ~50% gain of SAT sweeping (Fraiging)
- Cut sweeping is much faster than SAT sweeping
- Typically 10-100x, for large designs
- Can be used as a fast preprocessing to (or a low-cost substitute for) SAT sweeping

Sequential Resynthesis for Delay

- Restructure logic along the tightest sequential loops to reduce delay after retiming (Soviani/Edwards, TCAD’07)
- Similar to sequential mapping
- Computes seq arrival times for the circuit
- Uses the current logic structure, as well as logic structure, transformed using Shannon expansion w.r.t. the latest variables
- Accepts transforms leading to delay reduction
- In the end, retimes to the best clock period
- The improvement is 7-60% in delay with 1-12% area degradation (ISCAS circuits)
- This algorithm could benefit from the use of priority cuts

- Traditionalcut-based technology mapping
- Improved technology mapping
- Sequential mapping
- Other applications of priority cuts
- Experimental results

Experimental Comparison

- Compare the new mapping against the traditional mapping in terms of
- Delay
- Area
- Runtime
- Memory
- Compare on large industrial benchmarks with choices
- Analyze the performance of the new mapping for
- Large designs
- Large LUTs
- Explore the potential of sequential mapping
- Computer used for experiments
- IBM ThinkPad laptop with 1.6GHz and 2Gb RAM

Priority cuts vs. Cut enumeration (C=8)

Used a set of the large public benchmarks

Priority Cutsvs. Cut Enumeration (K=6, C = 16)

Cut enumeration

Priority cuts

Mapping w/o choices

Priority cuts

Cut enumeration

Mapping with choices

Used a set of large industrial benchmarks

Performance on Large Designs (C=1)

Using design wb_conmax.v (part of IWLS 2005 benchmarks)

This is a WISHBONE Interconnect Matrix IP core. It can interconnect up to 8 Masters and 16 Slaves

Source: http://www.opencores.org

Performance for Large LUTs (C=1)

Using 100 timeframes of design wb_conmax.v

Sequential Mapping (K=6, C=8)

Used a subset of ISCAS benchmarks, for which retiming reduced delay

Summary

- Reviewed traditional technology mapping
- Cut computation
- Optimum-depth mapping
- Area recovery
- Presented an improved approach to mapping
- Computes a small number of cuts at each node
- Uses new ideas to dramatically reduce memory and runtime
- Reported experimental results
- Compared priority cuts with exhaustive cut enumeration
- Delay and area are comparable or better by 1-3%
- Memory and runtime are greatly reduced (5x for 6-LUTs)
- Showed performance on very large designs (2 sec to map 1M)
- Compared combinational and sequential mapping
- Implemented in ABC
- Google: “abc berkeley” (package “if”)

Download Presentation

Connecting to Server..