combinational and sequential mapping with priority cuts
Download
Skip this Video
Download Presentation
Combinational and Sequential Mapping with Priority Cuts

Loading in 2 Seconds...

play fullscreen
1 / 30

Combinational and Sequential Mapping with Priority Cuts - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Combinational and Sequential Mapping with Priority Cuts. Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley. Outline. Traditional cut-based LUT mapping Improved technology mapping with priority cuts Sequential mapping Other applications of priority cuts

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Combinational and Sequential Mapping with Priority Cuts' - morna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
combinational and sequential mapping with priority cuts

Combinational and Sequential Mapping with Priority Cuts

Alan Mishchenko

Sungmin Cho

Satrajit Chatterjee

Robert Brayton

UC Berkeley

outline
Outline
  • Traditional cut-based LUT mapping
  • Improved technology mapping with priority cuts
  • Sequential mapping
  • Other applications of priority cuts
  • Experimental results
technology mapping
Technology Mapping

Input: A Boolean network (And-Inverter Graph)

Output: A netlist of K-LUTs implementing the Boolean network optimizing some cost function

f

f

Technology

Mapping

e

e

a

c

d

a

c

d

b

b

The subject graph

The mapped netlist

k feasible cuts
k-feasible Cuts

r

A cut of a node n is a set of nodes in transitive fan-in

such that

every path from the node to PIs is blocked by nodes in the cut.

A k-feasible cut means the size of the cut must be k or less.

p

q

a

b

c

The set {p, b, c} is a 3-feasible cut of node r. (It is also a 4-feasible cut.)

k-feasible cuts are important in FPGA mapping since the logic between a node and the nodes in its cut can be replaced by a k-LUT.

k feasible cut computation
k-feasible Cut Computation

The set of cuts of a node is a ‘cross product’ of the sets of cuts of its children

{ {r},{p, q}, {p, b, c},{a, b, q}, {a, b, c} }

r

{ {p},{a, b} }

{ {q},{b, c} }

Computation is done bottom-up

p

q

{ {b} }

{ {c} }

{ {a} }

a

c

b

Any cut that is of size greater than k is discarded

(P. Pan et al, FPGA ’98; J. Cong et al, FPGA ’99)

basic mapping algorithm
Basic Mapping Algorithm

Depth-optimal LUT mapping of a DAG using all cuts at each node

Input: And-Inverter Graph

  • Compute K-feasible cuts for each node
  • Compute best arrival time at each node
    • In topological order (from PI to PO)
    • Compute the depth of all cuts and choose the best one
  • Perform area recovery
    • Using area flow
    • Using exact local area
  • Chose the best cover
    • In reverse topological order (from PO to PI)

Output: Mapped Netlist

area recovery summary
Area Recovery Summary
  • Area recovery heuristics
    • Area-flow (global view)
      • Chooses cuts with better logic sharing
    • Exact local area (local view)
      • Minimizes the number of LUTs needed to map each node
  • The results of area recovery depends on
    • The order of processing nodes
    • The order of applying two passes
    • The number of iterations
  • This scheme works for the constant-delay model
    • Any change off the critical path doesn’t affect critical path
drawbacks of traditional mapping based on exhaustive cut enumeration
Drawbacks of Traditional Mapping Based on Exhaustive Cut Enumeration
  • For large designs, there may be many k-feasible cuts
    • Order of millions
  • Previous ways of dealing with the problem
    • Detect and remove cut dominance
    • Perform cut pruning
    • Store only cuts on the frontier of mapping
outline1
Outline
  • Traditionalcut-based technology mapping
  • Improved technology mapping
  • Sequential mapping
  • Other applications of priority cuts
  • Experimental results
new mapping algorithm
New Mapping Algorithm

Near-depth-optimal LUT mapping of a DAG using several cuts at each node

Input: And-Inverter Graph

  • Compute K-feasible cuts for each node
  • Compute arrival time at each node
    • In topological order (from PI to PO)
    • Compute the depth of all cuts and choose the best one
    • Compute at most C good cuts and choose the best one
  • Perform area recovery
    • Using area flow
    • Using exact local area
    • Re-compute at most C good cuts and choose the best one in each iteration
  • Chose the best cover
    • In reverse topological order (from PO to PI)

Output: Mapped Netlist

computing priority cuts
Computing Priority Cuts
  • Consider nodes in a topological order
    • At each node, merge two sets of fanin cuts (each containing C cuts) getting (C+1) * (C+1) + 1 cuts
    • Sort these cuts using a given cost function, select C best cuts, and use them for computing priority cuts of the fanouts
    • Select one best cut, and use it to map the node
  • Sorting criteria
discussion
Discussion
  • K - max cut size
  • C - max number of cuts
  • n - number of nodes
  • m – number of edges
  • Complexity analysis
    • Traditional mapping algorithm
      • FlowMap O(Kmn) (J. Cong et al, TCAD ’94)
      • CutMap O(2KmnK) (J. Cong et al, FPGA ’95)
    • Proposed mapping algorithm
      • O(KC2n)
priority cuts a bag of tricks
Priority Cuts: A Bag of Tricks
  • Compute and use priority cuts (a subset of all cuts)
  • Dynamically update the cuts in each mapping pass
  • Use different sorting criteria in each mapping pass
  • Include the best cut from the previous pass into the set of candidate cuts of the current pass
  • Consider several depth-oriented mappings to get a good starting point for area recovery
  • Use complementary heuristics for area recovery
  • Perform cut expansion as part of area recovery
  • Use efficient memory management
outline2
Outline
  • Traditionalcut-based technology mapping
  • Improved technology mapping
  • Sequential mapping
  • Other applications of priority cuts
  • Experimental results
sequential mapping
Sequential Mapping
  • That is, combinational mapping and retiming combined
    • Minimizes clock period in the combined solution space
    • Previous work:
      • Pan et al, FPGA’98
      • Cong et al, TCAD’98
  • Our contribution: divide sequential mapping into steps
    • Find the best clock period via sequential arrival time computation (Pan et al, FPGA’98)
    • Run combinational mapping with the resulting arrival/required times of the register outputs/inputs
    • Perform final retiming to bring the circuit to the best clock period computed in Step 1
sequential mapping continued
Sequential Mapping (continued)
  • Advantages
    • Uses priority cuts (L=1) for computing sequential arrival times
      • very fast
    • Reuses efficient area recovery available in combinational mapping
      • almost no degradation in LUT count and register count
    • Greatly simplifies implementation
      • due to not computing sequential cuts (cuts crossing register boundary)
  • Quality of results
    • Leads to quality that is better (by ~15%) than combinational mapping followed by retiming
      • due to searching the combined search space
    • Achieves almost the same (-1%) clock period as the general sequential mapping with sequential cuts
      • due to using transparent register boundary without computing sequential cuts
outline3
Outline
  • Traditionalcut-based technology mapping
  • Improved technology mapping
  • Sequential mapping
  • Other applications of priority cuts
  • Experimental results
speeding up sat solving
Speeding Up SAT Solving
  • Perform technology mapping into K-LUTs for area
    • Define area as the number of CNF clauses needed to represent the Boolean function of the cut
    • Run several iterations of area recovery
  • Reduced the number of CNF clauses by ~50%
    • Compared to a smart circuit-to-CNF translation (M. Velev)
  • Improves SAT solver runtime by 3-10x
    • Experimental results will be given later
minimizing the total number of bdd nodes needed to represent a boolean network
Minimizing the Total Number of BDD Nodes Needed to Represent a Boolean Network
  • Perform technology mapping into K-LUTs for minimizing area under delay constraints
    • Define area of a cut as the number of BDD nodes needed to represent the Boolean function of the cut
    • Run delay-oriented mapping, followed by several iterations of area recovery
cut sweeping
Cut Sweeping
  • Reduce the circuit by detecting and merging shallow equivalences (proposed by Niklas Een)
    • By “shallow” equivalences, we mean equivalent points, A and B, for which there exists a K-cut C (K < 16) such that FA(C) = FB(C)
    • A subset of “good” K-input priority cuts can be computed
    • The quality of a cut is determined by the number of fanouts of the cut leaves
      • The more fanouts, the more likely the cut is a common cut for two nodes
  • Cut sweeping quickly reduces the circuit
    • Typically ~50% gain of SAT sweeping (Fraiging)
  • Cut sweeping is much faster than SAT sweeping
    • Typically 10-100x, for large designs
  • Can be used as a fast preprocessing to (or a low-cost substitute for) SAT sweeping
sequential resynthesis for delay
Sequential Resynthesis for Delay
  • Restructure logic along the tightest sequential loops to reduce delay after retiming (Soviani/Edwards, TCAD’07)
    • Similar to sequential mapping
    • Computes seq arrival times for the circuit
    • Uses the current logic structure, as well as logic structure, transformed using Shannon expansion w.r.t. the latest variables
    • Accepts transforms leading to delay reduction
    • In the end, retimes to the best clock period
  • The improvement is 7-60% in delay with 1-12% area degradation (ISCAS circuits)
  • This algorithm could benefit from the use of priority cuts
outline4
Outline
  • Traditionalcut-based technology mapping
  • Improved technology mapping
  • Sequential mapping
  • Other applications of priority cuts
  • Experimental results
experimental comparison
Experimental Comparison
  • Compare the new mapping against the traditional mapping in terms of
    • Delay
    • Area
    • Runtime
    • Memory
  • Compare on large industrial benchmarks with choices
  • Analyze the performance of the new mapping for
    • Large designs
    • Large LUTs
  • Explore the potential of sequential mapping
  • Computer used for experiments
    • IBM ThinkPad laptop with 1.6GHz and 2Gb RAM
priority cuts vs cut enumeration c 8
Priority cuts vs. Cut enumeration (C=8)

Used a set of the large public benchmarks

priority cuts vs cut enumeration k 6 c 16
Priority Cutsvs. Cut Enumeration (K=6, C = 16)

Cut enumeration

Priority cuts

Mapping w/o choices

Priority cuts

Cut enumeration

Mapping with choices

Used a set of large industrial benchmarks

performance on large designs c 1
Performance on Large Designs (C=1)

Using design wb_conmax.v (part of IWLS 2005 benchmarks)

This is a WISHBONE Interconnect Matrix IP core. It can interconnect up to 8 Masters and 16 Slaves

Source: http://www.opencores.org

performance for large luts c 1
Performance for Large LUTs (C=1)

Using 100 timeframes of design wb_conmax.v

sequential mapping k 6 c 8
Sequential Mapping (K=6, C=8)

Used a subset of ISCAS benchmarks, for which retiming reduced delay

summary
Summary
  • Reviewed traditional technology mapping
    • Cut computation
    • Optimum-depth mapping
    • Area recovery
  • Presented an improved approach to mapping
    • Computes a small number of cuts at each node
    • Uses new ideas to dramatically reduce memory and runtime
  • Reported experimental results
    • Compared priority cuts with exhaustive cut enumeration
      • Delay and area are comparable or better by 1-3%
      • Memory and runtime are greatly reduced (5x for 6-LUTs)
    • Showed performance on very large designs (2 sec to map 1M)
    • Compared combinational and sequential mapping
  • Implemented in ABC
    • Google: “abc berkeley” (package “if”)
ad