Advanced Mapping into LUT Structures for FPGA Optimization

Mapping into LUT Structures Sayak Ray, Alan Mishchenko, Niklas Een, Robert Brayton Department of EECS, UC Berkeley Stephen Jang, Chao Chen Agate Logic Inc.

Contributions (in a nutshell) New mapping algorithm for FPGAs, which maps into LUT structures, instead of LUTs It has two applications: (1) Improving the quality of mapping into LUTs Area improves by 7.4% on average Delay improves by 11.3% on average (2) Improving delay for specialized hardware, which supports non-routable connections Delay improves by 40% on average With some area penalty

LUT Structure LUT-structure – a group of LUTs connected by direct, non-routable wires Non-routable Wire Non-routable Wire Non-routable Wire 7-input LUT structure “44” 10‑input LUT structure “444”

Some Terminology Let (X) be a Boolean function Let X1  X be a subset of its support Suppose {q1(X), q2(X), …, q(X)} is the set of distinct cofactors of  w.r.t. X1  is called the column multiplicity of  w.r.t X1 Given a partition of X into two disjoint subsets X1and X2, we say that Ashenhurst-Curtis decomposition of(X) exists if(X) can be expressed as (X) = h(g1(X1), g2(X1), …, gk(X1), X2) X1 : bound set X2 : free set

Flow of performLutMatchingXY 1 SupportMinimize removes vacuous variables 2 findOutputDecomposition Checks for f = x  G • Variable reordering in truth table • Allows cases  = 2, 3, 4 • For  = 3, 4, consider special decomposition with one shared variable only 3 findGoodBoundSet 4 checkSpecialNonDisjoint 5 reverseVariableOrder A heuristic to find suitable decomposition 6 findGoodBoundSet 7 checkSpecialNonDisjoint

Checking for XYZ decomposition X, Y, and Z are sizes of the main/fanin LUTs Two step process Checking for XW where W = Y + Z – 2 If it exists, then check the remainder function G for YZ Priority cut-based technology mapper is modified to accommodate the algorithm for XY and XYZ The results of decomposition checking are cached This substantially reduces runtime on large designs

Experiment 1

Experiment 2

Experiment 3

Experiment 4 – Delay Optimization

Experiment 7 : industrial design

Experiment 8 : industrial design

Future Work • Improving Implementation • Handling delay driven decomposition • Currently we ignore arrival time, and just care about detecting any decomposition • Using semi-canonical form to increase the number of hits in the hash table of computed results • Making truth-table based decomposition even faster • Combining Boolean decomposition into LUT structures with structural mapping of LUTs into clusters • Evaluating results after place and route • This will be especially interesting when specialized hardware is available

Questions • Questions….

Advanced Mapping into LUT Structures for FPGA Optimization

Advanced Mapping into LUT Structures for FPGA Optimization

Presentation Transcript

Mapping for Better Than Worst-Case Delays In LUT-Based FPGA Designs

Mapping SMPTE 259 into ATM structure

Mapping Local Assets: Transforming Potential Into Action

Mapping Existing Data Sources into VIVO

Mapping Petri Nets into Event Graphs

LUT Summerschool 2013

POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS

Mapping identified services into their relevant categories

Faithful mapping of model classes to mathematical structures

LUT:1785-2272

Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

LUT Status

The Implicit Mapping into Feature Space

LUT Encoding

Mapping into LUT Structures

LUT Method For Inverse Halftone