Combinational and sequential mapping with priority cuts
Download
1 / 30

Combinational and Sequential Mapping with Priority Cuts - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Combinational and Sequential Mapping with Priority Cuts. Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley. Outline. Traditional cut-based LUT mapping Improved technology mapping with priority cuts Sequential mapping Other applications of priority cuts

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Combinational and Sequential Mapping with Priority Cuts' - morna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Combinational and sequential mapping with priority cuts

Combinational and Sequential Mapping with Priority Cuts

Alan Mishchenko

Sungmin Cho

Satrajit Chatterjee

Robert Brayton

UC Berkeley


Outline
Outline

  • Traditional cut-based LUT mapping

  • Improved technology mapping with priority cuts

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


Technology mapping
Technology Mapping

Input: A Boolean network (And-Inverter Graph)

Output: A netlist of K-LUTs implementing the Boolean network optimizing some cost function

f

f

Technology

Mapping

e

e

a

c

d

a

c

d

b

b

The subject graph

The mapped netlist


K feasible cuts
k-feasible Cuts

r

A cut of a node n is a set of nodes in transitive fan-in

such that

every path from the node to PIs is blocked by nodes in the cut.

A k-feasible cut means the size of the cut must be k or less.

p

q

a

b

c

The set {p, b, c} is a 3-feasible cut of node r. (It is also a 4-feasible cut.)

k-feasible cuts are important in FPGA mapping since the logic between a node and the nodes in its cut can be replaced by a k-LUT.


K feasible cut computation
k-feasible Cut Computation

The set of cuts of a node is a ‘cross product’ of the sets of cuts of its children

{ {r},{p, q}, {p, b, c},{a, b, q}, {a, b, c} }

r

{ {p},{a, b} }

{ {q},{b, c} }

Computation is done bottom-up

p

q

{ {b} }

{ {c} }

{ {a} }

a

c

b

Any cut that is of size greater than k is discarded

(P. Pan et al, FPGA ’98; J. Cong et al, FPGA ’99)


Basic mapping algorithm
Basic Mapping Algorithm

Depth-optimal LUT mapping of a DAG using all cuts at each node

Input: And-Inverter Graph

  • Compute K-feasible cuts for each node

  • Compute best arrival time at each node

    • In topological order (from PI to PO)

    • Compute the depth of all cuts and choose the best one

  • Perform area recovery

    • Using area flow

    • Using exact local area

  • Chose the best cover

    • In reverse topological order (from PO to PI)

      Output: Mapped Netlist


Area recovery summary
Area Recovery Summary

  • Area recovery heuristics

    • Area-flow (global view)

      • Chooses cuts with better logic sharing

    • Exact local area (local view)

      • Minimizes the number of LUTs needed to map each node

  • The results of area recovery depends on

    • The order of processing nodes

    • The order of applying two passes

    • The number of iterations

  • This scheme works for the constant-delay model

    • Any change off the critical path doesn’t affect critical path


Drawbacks of traditional mapping based on exhaustive cut enumeration
Drawbacks of Traditional Mapping Based on Exhaustive Cut Enumeration

  • For large designs, there may be many k-feasible cuts

    • Order of millions

  • Previous ways of dealing with the problem

    • Detect and remove cut dominance

    • Perform cut pruning

    • Store only cuts on the frontier of mapping


Outline1
Outline Enumeration

  • Traditionalcut-based technology mapping

  • Improved technology mapping

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


New mapping algorithm
New Mapping Algorithm Enumeration

Near-depth-optimal LUT mapping of a DAG using several cuts at each node

Input: And-Inverter Graph

  • Compute K-feasible cuts for each node

  • Compute arrival time at each node

    • In topological order (from PI to PO)

    • Compute the depth of all cuts and choose the best one

    • Compute at most C good cuts and choose the best one

  • Perform area recovery

    • Using area flow

    • Using exact local area

    • Re-compute at most C good cuts and choose the best one in each iteration

  • Chose the best cover

    • In reverse topological order (from PO to PI)

      Output: Mapped Netlist


Computing priority cuts
Computing Priority Cuts Enumeration

  • Consider nodes in a topological order

    • At each node, merge two sets of fanin cuts (each containing C cuts) getting (C+1) * (C+1) + 1 cuts

    • Sort these cuts using a given cost function, select C best cuts, and use them for computing priority cuts of the fanouts

    • Select one best cut, and use it to map the node

  • Sorting criteria


Discussion
Discussion Enumeration

  • K - max cut size

  • C - max number of cuts

  • n - number of nodes

  • m – number of edges

  • Complexity analysis

    • Traditional mapping algorithm

      • FlowMap O(Kmn) (J. Cong et al, TCAD ’94)

      • CutMap O(2KmnK) (J. Cong et al, FPGA ’95)

    • Proposed mapping algorithm

      • O(KC2n)


Priority cuts a bag of tricks
Priority Cuts: A Bag of Tricks Enumeration

  • Compute and use priority cuts (a subset of all cuts)

  • Dynamically update the cuts in each mapping pass

  • Use different sorting criteria in each mapping pass

  • Include the best cut from the previous pass into the set of candidate cuts of the current pass

  • Consider several depth-oriented mappings to get a good starting point for area recovery

  • Use complementary heuristics for area recovery

  • Perform cut expansion as part of area recovery

  • Use efficient memory management


Outline2
Outline Enumeration

  • Traditionalcut-based technology mapping

  • Improved technology mapping

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


Sequential mapping
Sequential Mapping Enumeration

  • That is, combinational mapping and retiming combined

    • Minimizes clock period in the combined solution space

    • Previous work:

      • Pan et al, FPGA’98

      • Cong et al, TCAD’98

  • Our contribution: divide sequential mapping into steps

    • Find the best clock period via sequential arrival time computation (Pan et al, FPGA’98)

    • Run combinational mapping with the resulting arrival/required times of the register outputs/inputs

    • Perform final retiming to bring the circuit to the best clock period computed in Step 1


Sequential mapping continued
Sequential Mapping (continued) Enumeration

  • Advantages

    • Uses priority cuts (L=1) for computing sequential arrival times

      • very fast

    • Reuses efficient area recovery available in combinational mapping

      • almost no degradation in LUT count and register count

    • Greatly simplifies implementation

      • due to not computing sequential cuts (cuts crossing register boundary)

  • Quality of results

    • Leads to quality that is better (by ~15%) than combinational mapping followed by retiming

      • due to searching the combined search space

    • Achieves almost the same (-1%) clock period as the general sequential mapping with sequential cuts

      • due to using transparent register boundary without computing sequential cuts


Outline3
Outline Enumeration

  • Traditionalcut-based technology mapping

  • Improved technology mapping

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


Speeding up sat solving
Speeding Up SAT Solving Enumeration

  • Perform technology mapping into K-LUTs for area

    • Define area as the number of CNF clauses needed to represent the Boolean function of the cut

    • Run several iterations of area recovery

  • Reduced the number of CNF clauses by ~50%

    • Compared to a smart circuit-to-CNF translation (M. Velev)

  • Improves SAT solver runtime by 3-10x

    • Experimental results will be given later


Minimizing the total number of bdd nodes needed to represent a boolean network
Minimizing the Total Number of BDD Nodes Needed to Represent a Boolean Network

  • Perform technology mapping into K-LUTs for minimizing area under delay constraints

    • Define area of a cut as the number of BDD nodes needed to represent the Boolean function of the cut

    • Run delay-oriented mapping, followed by several iterations of area recovery


Cut sweeping
Cut Sweeping a Boolean Network

  • Reduce the circuit by detecting and merging shallow equivalences (proposed by Niklas Een)

    • By “shallow” equivalences, we mean equivalent points, A and B, for which there exists a K-cut C (K < 16) such that FA(C) = FB(C)

    • A subset of “good” K-input priority cuts can be computed

    • The quality of a cut is determined by the number of fanouts of the cut leaves

      • The more fanouts, the more likely the cut is a common cut for two nodes

  • Cut sweeping quickly reduces the circuit

    • Typically ~50% gain of SAT sweeping (Fraiging)

  • Cut sweeping is much faster than SAT sweeping

    • Typically 10-100x, for large designs

  • Can be used as a fast preprocessing to (or a low-cost substitute for) SAT sweeping


Sequential resynthesis for delay
Sequential Resynthesis for Delay a Boolean Network

  • Restructure logic along the tightest sequential loops to reduce delay after retiming (Soviani/Edwards, TCAD’07)

    • Similar to sequential mapping

    • Computes seq arrival times for the circuit

    • Uses the current logic structure, as well as logic structure, transformed using Shannon expansion w.r.t. the latest variables

    • Accepts transforms leading to delay reduction

    • In the end, retimes to the best clock period

  • The improvement is 7-60% in delay with 1-12% area degradation (ISCAS circuits)

  • This algorithm could benefit from the use of priority cuts


Outline4
Outline a Boolean Network

  • Traditionalcut-based technology mapping

  • Improved technology mapping

  • Sequential mapping

  • Other applications of priority cuts

  • Experimental results


Experimental comparison
Experimental Comparison a Boolean Network

  • Compare the new mapping against the traditional mapping in terms of

    • Delay

    • Area

    • Runtime

    • Memory

  • Compare on large industrial benchmarks with choices

  • Analyze the performance of the new mapping for

    • Large designs

    • Large LUTs

  • Explore the potential of sequential mapping

  • Computer used for experiments

    • IBM ThinkPad laptop with 1.6GHz and 2Gb RAM


Priority cuts vs cut enumeration c 8
Priority cuts vs. Cut enumeration (C=8) a Boolean Network

Used a set of the large public benchmarks


Priority cuts vs cut enumeration k 6 c 16
Priority Cuts a Boolean Networkvs. Cut Enumeration (K=6, C = 16)

Cut enumeration

Priority cuts

Mapping w/o choices

Priority cuts

Cut enumeration

Mapping with choices

Used a set of large industrial benchmarks


Performance on large designs c 1
Performance on Large Designs (C=1) a Boolean Network

Using design wb_conmax.v (part of IWLS 2005 benchmarks)

This is a WISHBONE Interconnect Matrix IP core. It can interconnect up to 8 Masters and 16 Slaves

Source: http://www.opencores.org


Performance for large luts c 1
Performance for Large LUTs (C=1) a Boolean Network

Using 100 timeframes of design wb_conmax.v


Sequential mapping k 6 c 8
Sequential Mapping (K=6, C=8) a Boolean Network

Used a subset of ISCAS benchmarks, for which retiming reduced delay


Summary
Summary a Boolean Network

  • Reviewed traditional technology mapping

    • Cut computation

    • Optimum-depth mapping

    • Area recovery

  • Presented an improved approach to mapping

    • Computes a small number of cuts at each node

    • Uses new ideas to dramatically reduce memory and runtime

  • Reported experimental results

    • Compared priority cuts with exhaustive cut enumeration

      • Delay and area are comparable or better by 1-3%

      • Memory and runtime are greatly reduced (5x for 6-LUTs)

    • Showed performance on very large designs (2 sec to map 1M)

    • Compared combinational and sequential mapping

  • Implemented in ABC

    • Google: “abc berkeley” (package “if”)


The end
The End a Boolean Network


ad