design synthesis and evaluation of heterogeneous fpga with mixed luts and macro gates l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates PowerPoint Presentation
Download Presentation
Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates

Loading in 2 Seconds...

play fullscreen
1 / 29

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates. Yu Hu 1 , Satyaki Das 2 , Steve Trimberger 2 , and Lei He 1 1. Electrical Engineering Dept., UCLA 2. Research Labs, Xilinx Inc. Presented by Yu Hu Address comments to lhe@ee.ucla.edu. Outline.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates' - tarannum


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
design synthesis and evaluation of heterogeneous fpga with mixed luts and macro gates

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates

Yu Hu1, Satyaki Das2, Steve Trimberger2, and Lei He1

1. Electrical Engineering Dept., UCLA

2. Research Labs, Xilinx Inc.

Presented by Yu Hu

Address comments to lhe@ee.ucla.edu

outline
Outline

Introduction

Design of the Macro-gates

Synthesis for the Proposed FPGA Architecture

Comparison of Heterogeneous FPGA Architectures

Conclusions and Future Work

heterogeneity in fpga architectures
Heterogeneity in FPGA Architectures
  • Heterogeneity among SLICEs
    • Programmable logic and routing
    • Tiles are not identical
      • soft logic fabric [Kaviani, FPGA’96]]
      • hard structures [Jamieson, FPL’05]
    • Dedicated hard structures
      • e.g. DSP
      • e.g memory block
  • Heterogeneity within a SLICE
    • Programmable logic and routing
    • Tiles (SLICEs) are identical
    • Different logics exist within a SLICE
      • e.g. LUTs with different size [Cong, FPGA’99]
      • e.g. mixed PLAs and LUTs [Cong, TODAES’05]
      • e.g. mixed macro-gates and LUTs

(source: Jamieson@FPL’05)

heterogeneous fpga with macro gates
Heterogeneous FPGA with Macro-Gates
  • There exists programmability and cost trade-off between LUTs and macrogates
    • Xilinx V4 benefits from small gates (MUX2, XOR2) built in SLICEs.
  • The benefit of wider macro-gates
    • Effectiveness of the incorporation of wider logic functions (macro gates) is not clear.
  • Our contributions
    • Design a new FPGA architecture with mixed LUTs and macro-gates
    • Propose a new automatic synthesis flow for mapping a circuit to the proposed FPGA architecture
    • Evaluate the architecture and show that the proposed architecture reduces delay and area by 16.5% and 30%, respective, compared to the LUT-only architecture.
outline5
Outline
  • Introduction
  • Design of the Macro-gates
  • Synthesis for the Proposed FPGA Architecture
  • Comparison of Heterogeneous FPGA Architectures
  • Conclusions and Future Work
overview of macro gate design
Overview of Macro-Gate Design
  • Key problem
    • Select the logic functions for the macro-gate
  • Problem formulation:
    • Input: a set of training circuits, which have been mapped to K-input LUTs
    • Output: N K-input Boolean functions: f1 , … , fN
    • Objective: Maximize the number of logics (in the training circuit set) which can be implemented by f1 , … , fN
  • The proposed solution
    • Ranking of the logic functions for a set of training circuits
npn class diagram organization of logics

Level3: 3-input

Level2: 2-input

Level1: 1-input

Level0: constant

NPN-Class Diagram: Organization of Logics
  • Canonical and efficient representation of all NPN classes
    • NPN-Equivalent: functional equivalency under inputs negation, permutation or output negation
    • E.g., f(a,b,c)=a+bc, g(a,b,c)=b’a+b’c
  • NPN-Cofactor relationship is indicated
  • DAG: easy to manipulate
  • It becomes impractical to compute for more than 6-input functions!
    • Solution: Utilization NPN-Class Diagram

Wider inputs

und utilization npn class diagram
UND: Utilization NPN-Class Diagram
  • UND is an DAG, sub-graph of NCD
  • Help for scoring and ranking functions

ab’c’+a’bc’

ab’c’+a’bc’ / 1 / xx%

abc/ 1 / xx%

abc

ab’+a’b

a

ab’+a’b / 0 / xx%

ab / 0 / xx%

a / 0 / xx%

Implementation capability

-0- / 0 / xx%

functionality

Appearance frequency

und utilization npn class diagram9
UND: Utilization NPN-Class Diagram

ab’c’+a’bc’

ab’c’+a’bc’ / 1 / xx%

abc/ 1 / xx%

abc

ab’+a’b

a

ab’+a’b / 1 / xx%

ab’+a’b / 0 / xx%

ab / 0 / xx%

a / 0 / xx%

a / 1 / xx%

-0- / 0 / xx%

und utilization npn class diagram10
UND: Utilization NPN-Class Diagram
  • Calculate Implementation Capability

ab’c’+a’bc’

ab’c’+a’bc’ / 1 / 75%

abc/ 1 / 50%

abc

ab’+a’b

a

ab’+a’b / 1 / 50%

ab / 0 / 25%

The topology property (DAG) of UND enables us to efficiently explore different metrics for functionality ranking, e.g.,utilization rate.

a / 1 / 25%

-0- / 0 / xx%

Fanout cone of

ab’c+a’bc’

recap overall flow for macro gate design

f

LUT

ab’c’+a’bc’ / 1 / xx%

ab’c’+a’bc’ / 1 / 75%

abc/ 1 / 50%

abc/ 1 / xx%

g

1+1*2/3+1*1/3=2

1+1*1/3=1.33

and2(3)

LUT

d

ab’+a’b / 1 / 50%

ab’+a’b / 1 / xx%

ab’+a’b / 0 / xx%

ab / 0 / 25%

ab / 0 / xx%

F

e

1*1/2=0.5

1+1*1/2=1.5

h

a / 0 / xx%

a / 1 / 25%

a / 1 / xx%

b

LUT

1

a

-0- / 0 / xx%

-0- / 0 / xx%

nand2(2)

c

inv(1)

Recap: Overall Flow for Macro-Gate Design

0000001000000000

0000010000000000

0000100000000000

0001000000000000

0010000000000000

0100000000000000

……

Map with

LUT-N

Extract logic

functions

Generate Utilization

NPN Diagram

Calculate score

For logic functions

Rank logic

functions

Best function: ab’c’+a’bc’

proposed macro gates and fpga architecture
Proposed Macro-Gates and FPGA Architecture
  • For IWLS’05 benchmarks, the following four 6-input functions have the highest ranks
    • GI1=a b c d e f (AND-6)
    • GI2=a’ b’ c’ + b c f’ + b c’ d’ + b’ c e (MUX-4)
    • GI3=a b' c d' e + b c e f + d e f
    • GI4=a b' + a' c d' + b' c' + e' + f‘
  • It can implement over 50% of logic functions in IWLS’05 benchmarks.
  • The architecture of the proposed macro-gate and FPGA SLICE are
outline13
Outline
  • Design of the Embedded Macro-gates
  • Synthesis for the Proposed FPGA Architecture
    • Technology Mapping for Heterogeneous FPGAs
    • SAT-based Packing
    • Place and Routing
  • Comparison of Heterogeneous FPGA Architectures
  • Conclusions and Future Work
functional structural cut enumeration

w

z

x

y

c

a

b

Yes

d

Functional & Structural Cut Enumeration

b=y+wz

a=(x+y)’

4-input macro gate lib

0000001000000000

0000010000000000

0000100000000000

0001000000000000

0010000000000000

0100000000000000

……

d=ab=(x+y)’(y+wz)=x’y’wz

Is x’v’wz in library?

  • Phase1:Enumerate and label cuts from PIs to Pos
    • Check the feasibility of a cut w.r.t. the macro-gate
  • Phase2:Select best choice from POs to Pis
  • A general yet efficient solution is SAT based Boolean matching
    • Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA Technology Mapping , Session 5C.1, ICCAD 07
key in technology mapping balance resource utilization
Key in Technology Mapping: Balance Resource Utilization
  • Asymmetric architecture causes problem to resource utilization
  • Exclusively use of one logic resource leads to lots of unused fabric
  • Simple yet effective solution :
    • Change LUT-MG ratio by adjusting their area weights.
    • Precise calibration is hard to reach by this approach.

Total# too large!

Objective

architecture:

LUT6:MacroGate6

=1:1

Hard to obtain precise calibration

Best LUT-MG ratio

= 1:1

LUT-MG ratio = LUT#/MG#

post mapping area recovery motivation example

MG6

MG6

MG6

MG6

Post-Mapping Area Recovery (motivation example)
  • Given:
    • Target architecture = LUT6 + MG6
    • LUT-MG ratio in target architecture = 1:1
    • LUT# < MG# in the mapped design
    • Intrinsic delay (LUT6 : MG6) = 5:4
  • Objective: balance LUT MG number without increasing delay

5 / 5

9 / 13

PO

LUT6

PI

17 / 17

9 / 9

13 / 13

4 / 5

MG6

MG6

8 / 9

post mapping area recovery motivation example17

MG6

MG6

MG6

Post-Mapping Area Recovery (motivation example)
  • Given:
    • Target architecture = LUT6 + MG6
    • LUT-MG ratio in target architecture = 1:1
    • LUT# < MG# in the mapped design
    • Intrinsic delay (LUT6 : MG6) = 5:4
  • Objective: balance LUT MG number without increasing delay

5 / 5

10 / 13

PO

LUT6

LUT6

PI

17 / 17

9 / 9

13 / 13

4 / 5

MG6

MG6

8 / 9

post mapping area recovery motivation example18

MG6

MG6

MG6

Post-Mapping Area Recovery (motivation example)
  • Given:
    • Target architecture = LUT6 + MG6
    • LUT-MG ratio in target architecture = 1:1
    • LUT# < MG# in the mapped design
    • Intrinsic delay (LUT6 : MG6) = 5:4
  • Objective: balance LUT MG number without increasing delay

Timing slack budgeting is necessary!

5 / 5

10 / 13

PO

LUT6

LUT6

PI

18 / 17

9 / 9

14 / 13

5 / 5

LUT6

LUT6

Timing target violation!

10 / 9

post mapping area recovery by timing budgeting

MG6

MG6

MG6

MG6

MG6

MG6

Post Mapping Area Recovery by Timing Budgeting
  • Formulated as an Integer Linear Programming (ILP) Problem
  • Objective (minimize gap between target and actual LUT-MG ratios): min |m2+…+m7-7/2|
  • Arrival time constraints: ai+dj+bj<=aj
  • Clock period target: ai<=17
  • LUT assignment with given timing slack: (5-4)*mj<=bj, mj={0,1}

a1

  • Easy to be generalized to handle arch
  • with multiple macro gates
  • with different input pin numbers

a2

PO

LUT6

PI

a3

a5

a4

a6

a7

outline20
Outline
  • Design of the Embedded Macro-gates
  • Synthesis for the Proposed FPGA Architecture
    • Technology Mapping for Heterogeneous FPGAs
    • SAT-based Packing
  • Comparison of Heterogeneous FPGA Architectures
  • Conclusions and Future Work
sat based packing
SAT-Based Packing
  • Motivation
    • Traditional packing tools, e.g., T-VPack, hard-codes the architecture specification of a SLICEs….
      • Re-impalement from scratch when architecture changes
    • Propose a unified implementation of the packers for different architectures: easy to perform architecture exploration!
  • The architecture dependent sub-problem in packing
    • Structural feasibility checking for a sub-circuit to the SLICE
  • Solution
    • Solve the problem of validating SLICE packing as a local place&route problem
    • A SAT solver is used to carry out the validation checking
example of sat based slice packing
Example of SAT-Based SLICE Packing
  • Examples of constraints: (for each classes of constraint…)
  • Placement and routing choice variables: X@A, X@B, U5@N10
  • Exclusively constraint: (¬X@A) ∨ (¬X@B)
  • Presence constraint: (X@A) ∨ (¬X@B)
  • Input/Output constraint: X@A → U5@N10
  • Routing constraint: G0 →out ∧ U5@N10) → U5@N12
recap overall synthesis flow

f

LUT

g

LUT

LUT

d

F

e

LUT6

LUT6

MG6

MG6

MG6

h

b

LUT6

LUT

a

MG6

MG6

MG6

LUT6

MG6

MG6

MG6

LUT6

LUT

c

LUT6

MG6

MG6

MG6

LUT

MG6

Recap: Overall Synthesis Flow

Area weight

Setting

Cut-based

Mapping

Y

Area-Balance

Trade-off?

Post-mapping

Area recovery

N

packing

outline24
Outline
  • Motivation and Objectives
  • Methodology for Logic Function Exploration
  • Technology Mapping for Heterogeneous FPGAs
  • Evaluation of Heterogeneous FPGA Architectures
  • Conclusions and Future Work
experimental setting
Experimental Setting
  • Design library parameters [Cong, TODAES’05]
  • Benchmark set: IWLS 2005
  • Four architectures are compared:
    • LUT4, LUT4 + macro gate, LUT6, and LUT6 + macro gate
    • Synthesize the proposed macro-gate by SIS1.2
    • Delay and area model
  • Interconnect delay is igonired
delay comparisons
Delay Comparisons
  • Compared to LUT4, LUT4+MG reduces both logic depth and delay by 9.2%.
  • Compared to LUT6, LUT6+MG reduces delay by 30% while increasing logic depth by 36.5%.
    • A LUT6 can implement more logics than a macro-gate
logic area comparisons
Logic Area Comparisons
  • Compared to LUT4, LUT4+MG reduces logic area by 12.5%.
  • Compared to LUT6, LUT6+MG reduces logic area by 16.9%.
outline28
Outline
  • Motivation and Objectives
  • Methodology for Logic Function Exploration
  • Technology Mapping for Heterogeneous FPGAs
  • Comparison of Heterogeneous FPGA Architectures
  • Conclusions and Future Work
conclusions
Conclusions
  • Conclusions
    • A novel FPGA architecture with the mixed LUTs and macro-gates is proposed
    • A synthesis flow for the proposed architecture is implemented
    • The preliminary experimental results show the effectiveness of the proposed architecture for the area and delay reduction
  • Future Work
    • Perform the physical design for the synthesized circuits and compare the routing costs, architecture evaluation considering interconnect delay
    • Study the effectiveness of the power reduction for the proposed architecture
    • Macro-gates with wider inputs will be examined