DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004

1 / 17

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 - PowerPoint PPT Presentation

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004. Presented by: Wei Chen. FPGA Architecture. K-input LUT can implement any Boolean function of K variables. So called Completeness. FPGA Technology Mapping.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004' - iden

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA DesignsDeming Chen, Jacon Cong ICCAD 2004

Presented by: Wei Chen

FPGA Architecture

K-input LUT can implement

any Boolean function of K

variables.

So called Completeness.

FPGA Technology Mapping

Given a circuit modeled as a DAG, partitioning the graph

such that every partition has not more than K inputs while

satisfying some objectives.

Terminology
• A Boolean network N can be modeled as a DAG
• Input(v): the set of fanin nodes of gate v
• Cone rooted on node v (Ov) is a sub-network of N consisting of v and some of its predecessors, such that for any node w∈Ov, there is a path from w to v that lies entirely in Ov.
• A cut is partitioning (X, X’) of a cone Ov such X’ is a cone of v.
• The cut set of the cut V(X,X’) consists of the inputs of cone X’.
• Cut size is the number of elements in cut set
• The level of a node v is the length of the longest path from any PI node to v.
• The depth of a network is the largest node level in the network.
• A Boolean network is L-bounded if |input(v)| ≤L for each v.
Terminology cont.

Input(6) = {4,6}

Level(6) = 2

Depth = 2

2-bouned

1

4

2

6

5

3

A cone rooted at node 6

1

4

A cut C

2

6

Cut set (C) = {1,2,5}

Cut size(C) = 3

5

3

DAOmap overall view
• A cut-enumeration-based method that consists of cut generation and cut selection.
• Cut generation/enumeration: for each node being considered, generate all the K-feasible cuts.
• Cut selection: Choose the nodes (and their best cuts) for implementation using LUTs
• Objective: Create a minimum area cover under the timing constraint (Optimal Depth).
Cut Enumeration
• Guided by the following theorem:

f(K,v) represents all the K-feasible cuts rooted at node v

f(4,5) = {1,2}

f(4,6) = {3 ,4}

f(4,7) = [5 + f(4,5)][6 + f(4,6)] = {5,6}

+ {5, f(4,6)} + {f(4,5),6} + {f(4,5) + f(4,6)}

= {5,6} + {5,3,4} + {1,2,6} + {1,2,3,4}

1

5

2

7

3

6

4

Delay propagation
• Unit delay model: each cut (LUT) on the paths represents one unit delay.
• The minimum arrive time for node v is:

0

0

1

1

1

1

1

5

5

5

1

5

2

2

2

2

0

0

7

7

7

7

3

3

3

1

0

3

0

1

6

6

6

6

4

4

4

0

4

0

Xv = the set of cuts that provides minimum arrive time

Arr_5 = 1 Arr_6 = 1 Arr_7 = 1

Area propagation
• The area of a cut c is calculated as:

2

4

1

3

Area propagation
• The area of a cut c is calculated as:
Area propagation
• The area of a cut c is calculated as:

2

4

1

3

2

4

1

3

Cut selection
• After cut enumeration, we obtain the optimal mapping depth of the network.
• Only critical paths need to use the cuts that lead to minimum delay.
• Cuts on non-critical paths can be reconstructed to search for a better solution in terms of area.

2

1

Iterative Cut Selection Procedure

Carry out a topological order traversal starting from POs, then the inputs of the generated LUTs are iteratively mapped. The procedure continues until all the PIs are reached.

Pick-up Algorithm

Input Sharing

Slack Distribution

Cut Probing

Experimental results

DAOmap is 16.02% better than CutMap in terms of LUT counts on average, and runs 24.2X faster when both are mapped with 5-LUT.

Conclusion
• This paper presents a technology mapping algorithm, DAOmap, for FPGA architectures to minimize chip area under timing constraints.
• Algorithm consists of Cut enumeration and Cut Selection.
• Novel heuristics has been designed to captured the mapping cost accurately with consideration of both local and global optimization information.
• Experimental results showed that DAOmap produced significant quality and run-time improvements compared to other mapping tools.