Combinational and sequential mapping with priority cuts
Download
1 / 30

Combinational and Sequential Mapping with Priority Cuts - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Combinational and Sequential Mapping with Priority Cuts. Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley. Outline. Traditional cut-based LUT mapping Improved technology mapping with priority cuts Sequential mapping Other applications of priority cuts

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Combinational and Sequential Mapping with Priority Cuts' - morna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Combinational and sequential mapping with priority cuts

Combinational and Sequential Mapping with Priority Cuts

Alan Mishchenko

Sungmin Cho

Satrajit Chatterjee

Robert Brayton

UC Berkeley


Outline
Outline

  • Traditional cut-based LUT mapping

  • Improved technology mapping with priority cuts

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


Technology mapping
Technology Mapping

Input: A Boolean network (And-Inverter Graph)

Output: A netlist of K-LUTs implementing the Boolean network optimizing some cost function

f

f

Technology

Mapping

e

e

a

c

d

a

c

d

b

b

The subject graph

The mapped netlist


K feasible cuts
k-feasible Cuts

r

A cut of a node n is a set of nodes in transitive fan-in

such that

every path from the node to PIs is blocked by nodes in the cut.

A k-feasible cut means the size of the cut must be k or less.

p

q

a

b

c

The set {p, b, c} is a 3-feasible cut of node r. (It is also a 4-feasible cut.)

k-feasible cuts are important in FPGA mapping since the logic between a node and the nodes in its cut can be replaced by a k-LUT.


K feasible cut computation
k-feasible Cut Computation

The set of cuts of a node is a ‘cross product’ of the sets of cuts of its children

{ {r},{p, q}, {p, b, c},{a, b, q}, {a, b, c} }

r

{ {p},{a, b} }

{ {q},{b, c} }

Computation is done bottom-up

p

q

{ {b} }

{ {c} }

{ {a} }

a

c

b

Any cut that is of size greater than k is discarded

(P. Pan et al, FPGA ’98; J. Cong et al, FPGA ’99)


Basic mapping algorithm
Basic Mapping Algorithm

Depth-optimal LUT mapping of a DAG using all cuts at each node

Input: And-Inverter Graph

  • Compute K-feasible cuts for each node

  • Compute best arrival time at each node

    • In topological order (from PI to PO)

    • Compute the depth of all cuts and choose the best one

  • Perform area recovery

    • Using area flow

    • Using exact local area

  • Chose the best cover

    • In reverse topological order (from PO to PI)

      Output: Mapped Netlist


Area recovery summary
Area Recovery Summary

  • Area recovery heuristics

    • Area-flow (global view)

      • Chooses cuts with better logic sharing

    • Exact local area (local view)

      • Minimizes the number of LUTs needed to map each node

  • The results of area recovery depends on

    • The order of processing nodes

    • The order of applying two passes

    • The number of iterations

  • This scheme works for the constant-delay model

    • Any change off the critical path doesn’t affect critical path


Drawbacks of traditional mapping based on exhaustive cut enumeration
Drawbacks of Traditional Mapping Based on Exhaustive Cut Enumeration

  • For large designs, there may be many k-feasible cuts

    • Order of millions

  • Previous ways of dealing with the problem

    • Detect and remove cut dominance

    • Perform cut pruning

    • Store only cuts on the frontier of mapping


Outline1
Outline Enumeration

  • Traditionalcut-based technology mapping

  • Improved technology mapping

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


New mapping algorithm
New Mapping Algorithm Enumeration

Near-depth-optimal LUT mapping of a DAG using several cuts at each node

Input: And-Inverter Graph

  • Compute K-feasible cuts for each node

  • Compute arrival time at each node

    • In topological order (from PI to PO)

    • Compute the depth of all cuts and choose the best one

    • Compute at most C good cuts and choose the best one

  • Perform area recovery

    • Using area flow

    • Using exact local area

    • Re-compute at most C good cuts and choose the best one in each iteration

  • Chose the best cover

    • In reverse topological order (from PO to PI)

      Output: Mapped Netlist


Computing priority cuts
Computing Priority Cuts Enumeration

  • Consider nodes in a topological order

    • At each node, merge two sets of fanin cuts (each containing C cuts) getting (C+1) * (C+1) + 1 cuts

    • Sort these cuts using a given cost function, select C best cuts, and use them for computing priority cuts of the fanouts

    • Select one best cut, and use it to map the node

  • Sorting criteria


Discussion
Discussion Enumeration

  • K - max cut size

  • C - max number of cuts

  • n - number of nodes

  • m – number of edges

  • Complexity analysis

    • Traditional mapping algorithm

      • FlowMap O(Kmn) (J. Cong et al, TCAD ’94)

      • CutMap O(2KmnK) (J. Cong et al, FPGA ’95)

    • Proposed mapping algorithm

      • O(KC2n)


Priority cuts a bag of tricks
Priority Cuts: A Bag of Tricks Enumeration

  • Compute and use priority cuts (a subset of all cuts)

  • Dynamically update the cuts in each mapping pass

  • Use different sorting criteria in each mapping pass

  • Include the best cut from the previous pass into the set of candidate cuts of the current pass

  • Consider several depth-oriented mappings to get a good starting point for area recovery

  • Use complementary heuristics for area recovery

  • Perform cut expansion as part of area recovery

  • Use efficient memory management


Outline2
Outline Enumeration

  • Traditionalcut-based technology mapping

  • Improved technology mapping

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


Sequential mapping
Sequential Mapping Enumeration

  • That is, combinational mapping and retiming combined

    • Minimizes clock period in the combined solution space

    • Previous work:

      • Pan et al, FPGA’98

      • Cong et al, TCAD’98

  • Our contribution: divide sequential mapping into steps

    • Find the best clock period via sequential arrival time computation (Pan et al, FPGA’98)

    • Run combinational mapping with the resulting arrival/required times of the register outputs/inputs

    • Perform final retiming to bring the circuit to the best clock period computed in Step 1


Sequential mapping continued
Sequential Mapping (continued) Enumeration

  • Advantages

    • Uses priority cuts (L=1) for computing sequential arrival times

      • very fast

    • Reuses efficient area recovery available in combinational mapping

      • almost no degradation in LUT count and register count

    • Greatly simplifies implementation

      • due to not computing sequential cuts (cuts crossing register boundary)

  • Quality of results

    • Leads to quality that is better (by ~15%) than combinational mapping followed by retiming

      • due to searching the combined search space

    • Achieves almost the same (-1%) clock period as the general sequential mapping with sequential cuts

      • due to using transparent register boundary without computing sequential cuts


Outline3
Outline Enumeration

  • Traditionalcut-based technology mapping

  • Improved technology mapping

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


Speeding up sat solving
Speeding Up SAT Solving Enumeration

  • Perform technology mapping into K-LUTs for area

    • Define area as the number of CNF clauses needed to represent the Boolean function of the cut

    • Run several iterations of area recovery

  • Reduced the number of CNF clauses by ~50%

    • Compared to a smart circuit-to-CNF translation (M. Velev)

  • Improves SAT solver runtime by 3-10x

    • Experimental results will be given later


Minimizing the total number of bdd nodes needed to represent a boolean network
Minimizing the Total Number of BDD Nodes Needed to Represent a Boolean Network

  • Perform technology mapping into K-LUTs for minimizing area under delay constraints

    • Define area of a cut as the number of BDD nodes needed to represent the Boolean function of the cut

    • Run delay-oriented mapping, followed by several iterations of area recovery


Cut sweeping
Cut Sweeping a Boolean Network

  • Reduce the circuit by detecting and merging shallow equivalences (proposed by Niklas Een)

    • By “shallow” equivalences, we mean equivalent points, A and B, for which there exists a K-cut C (K < 16) such that FA(C) = FB(C)

    • A subset of “good” K-input priority cuts can be computed

    • The quality of a cut is determined by the number of fanouts of the cut leaves

      • The more fanouts, the more likely the cut is a common cut for two nodes

  • Cut sweeping quickly reduces the circuit

    • Typically ~50% gain of SAT sweeping (Fraiging)

  • Cut sweeping is much faster than SAT sweeping

    • Typically 10-100x, for large designs

  • Can be used as a fast preprocessing to (or a low-cost substitute for) SAT sweeping


Sequential resynthesis for delay
Sequential Resynthesis for Delay a Boolean Network

  • Restructure logic along the tightest sequential loops to reduce delay after retiming (Soviani/Edwards, TCAD’07)

    • Similar to sequential mapping

    • Computes seq arrival times for the circuit

    • Uses the current logic structure, as well as logic structure, transformed using Shannon expansion w.r.t. the latest variables

    • Accepts transforms leading to delay reduction

    • In the end, retimes to the best clock period

  • The improvement is 7-60% in delay with 1-12% area degradation (ISCAS circuits)

  • This algorithm could benefit from the use of priority cuts


Outline4
Outline a Boolean Network

  • Traditionalcut-based technology mapping

  • Improved technology mapping

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


Experimental comparison
Experimental Comparison a Boolean Network

  • Compare the new mapping against the traditional mapping in terms of

    • Delay

    • Area

    • Runtime

    • Memory

  • Compare on large industrial benchmarks with choices

  • Analyze the performance of the new mapping for

    • Large designs

    • Large LUTs

  • Explore the potential of sequential mapping

  • Computer used for experiments

    • IBM ThinkPad laptop with 1.6GHz and 2Gb RAM


Priority cuts vs cut enumeration c 8
Priority cuts vs. Cut enumeration (C=8) a Boolean Network

Used a set of the large public benchmarks


Priority cuts vs cut enumeration k 6 c 16
Priority Cuts a Boolean Networkvs. Cut Enumeration (K=6, C = 16)

Cut enumeration

Priority cuts

Mapping w/o choices

Priority cuts

Cut enumeration

Mapping with choices

Used a set of large industrial benchmarks


Performance on large designs c 1
Performance on Large Designs (C=1) a Boolean Network

Using design wb_conmax.v (part of IWLS 2005 benchmarks)

This is a WISHBONE Interconnect Matrix IP core. It can interconnect up to 8 Masters and 16 Slaves

Source: http://www.opencores.org


Performance for large luts c 1
Performance for Large LUTs (C=1) a Boolean Network

Using 100 timeframes of design wb_conmax.v


Sequential mapping k 6 c 8
Sequential Mapping (K=6, C=8) a Boolean Network

Used a subset of ISCAS benchmarks, for which retiming reduced delay


Summary
Summary a Boolean Network

  • Reviewed traditional technology mapping

    • Cut computation

    • Optimum-depth mapping

    • Area recovery

  • Presented an improved approach to mapping

    • Computes a small number of cuts at each node

    • Uses new ideas to dramatically reduce memory and runtime

  • Reported experimental results

    • Compared priority cuts with exhaustive cut enumeration

      • Delay and area are comparable or better by 1-3%

      • Memory and runtime are greatly reduced (5x for 6-LUTs)

    • Showed performance on very large designs (2 sec to map 1M)

    • Compared combinational and sequential mapping

  • Implemented in ABC

    • Google: “abc berkeley” (package “if”)


The end
The End a Boolean Network