Fpga intra cluster routing crossbar design
This presentation is the property of its rightful owner.
Sponsored Links
1 / 41

FPGA Intra-cluster Routing Crossbar Design PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on
  • Presentation posted in: General

FPGA Intra-cluster Routing Crossbar Design. Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223. Generating Highly Routable Sparse Crossbars for PLDs. Guy Lemieux, Paul Leventis , David Lewis International Symposium on FPGAs, 2000 .

Download Presentation

FPGA Intra-cluster Routing Crossbar Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fpga intra cluster routing crossbar design

FPGA Intra-cluster Routing Crossbar Design

Dr. Philip Brisk

Department of Computer Science and Engineering

University of California, Riverside

CS 223


Generating highly routable sparse crossbars for plds

Generating Highly Routable Sparse Crossbars for PLDs

Guy Lemieux, Paul Leventis, David Lewis

International Symposium on FPGAs, 2000


Basic notation

Basic Notation


Fully populated crossbar

Fully Populated Crossbar

  • Full capacity – can connect as many signals as the number of outputs

  • Flexibility – Can connect any input to any output


Full capacity minimal crossbars

Full-capacity Minimal Crossbars

  • Full capacity

  • Reduced Flexibility: you lose the ability to connect any input to any output

  • p = m(m – n + 1) switches


Full capacity minimal crossbars1

Full-capacity Minimal Crossbars

  • Area savings is minimal if n >> m


Perfect and sparse crossbars

Perfect and Sparse Crossbars

  • Perfect crossbars

    • Can disjointly route any m-sized subset of the n inputs to the m outputs

    • Both full and full-capacity minimal crossbars are perfect

  • Sparse crossbars

    • Has p < m(m – n + 1) switches

    • Cannot be perfect


Bipartite graph representation

Bipartite Graph Representation

O1

I1

O2

I2

O1

I3

O3

O2

O3

O4

I4

I5

O4

I1

I3

I4

I2

I5

I6

I6


Evaluation challenge

Evaluation Challenge

  • How “routable” is a given crossbar?

    • Build an FPGA, map 20+ applications, observe results

      • Slow, highly subject to the application mix

    • Monte Carlo Test

      • Generate random test vectors

      • Route each test vector on the crossbar (network flow)

      • Report number of successes as a percentage

      • A highly routable sparse crossbar has a >= 95% success rate


Hall s theorm

Hall’s Theorm

  • Given a bipartite graph G = (V, E)

    • X, Y are the bipartite independent sets of G

      G has a matching of X onto Y if and only if

      N(v) is the set of neighbors of vertex v

      N(S) is the set of neighbors of all vertices in S

  • Leverage Hall’s Theorem to generate routable sparse crossbars!


Practical issues

Practical Issues

  • Cannot enumerate all subsets of m inputs

  • N(x) should be approximately equal for all input vertices x in X

    • Otherwise, any subset containing a large number of low-degree vertices is unlikely to be routable

  • N(y) should be approximately equal for all output vertices y in Y

    • Symmetric argument


Hamming distance and coding theory

Hamming Distance and Coding Theory

  • Represent N(v) as a bitvectorbv

    • bv[i] = 1 if v fans out to Oi

  • Hamming Distance

    • d(bv1, bv2)

  • Strategy

    • Maximize d(bvi, bvj) for every pair of distinct vertices vi and vj


Switch placement optimizer

Switch Placement Optimizer

  • Start with initial switch placement

  • Generate random swap of switch positions

    • Accept the swap if there is an improvement

    • Otherwise, reject the swap

  • Stop after a fixed number of swap candidates (e.g., 10K) fails to find an improvement

  • Objective is to minimize:


Example

Example

Identical Hamming costs before and after the swap

Before: cannot route {1, 2, 3}

After: reduces Hamming costs


168x24 crossbar 10k test vectors

168x24 Crossbar, 10K Test Vectors


Altera flex 8000 hp plasma hextant

Altera Flex 8000 HP Plasma Hextant


Switches vs routability

# Switches vs. Routability


Using sparse crossbars within lut clusters

Using Sparse Crossbars within LUT Clusters

Guy Lemieux, David Lewis

International Symposium on FPGAs, 2001


Five questions

Five Questions

  • Will depopulation save area, require greater routing area, or create unroutable architectures?

  • Will depopulation reduce or increase routing delays?

  • What amount of depopulation is reasonable?

  • How much area or delay reduction can be attained, if any?

  • What are the other effects of depopulating the cluster?


Architecture and parameters

Architecture and Parameters


Results

Results


Designing efficient input interconnect blocks for lut clusters using counting and entropy

Designing Efficient Input Interconnect Blocks for LUT Clusters Using Counting and Entropy

WenyiFeng and SinanKaptanoglu

ACM Transactions on Reconfigurable Technology and Systems (TRETS), 1(1): article #6, March, 2008

Note: Paper is from Actel (now Microsemi)


Count configurations details omitted

Count Configurations (Details Omitted)

312 Configurations

256 Configurations

784 Configurations


Routing requirement vector rrv

Routing Requirement Vector (RRV)

  • An ordered list of N subsets containing K distinct signals

  • The ith subset is K distinct signals to route to the ith K-LUT

  • Total number of RRVs for the crossbar:

M inputs

KN outputs


Entropy of an intra cluster routing crossbar

Entropy of an Intra-cluster Routing Crossbar

  • H = lg(# routable RRVs)

    • Accounts for equivalence of LUT inputs

  • Why Entropy?

    • # routable RRVs is huge

    • Minimum number of configuration bits to program the crossbar

    • Inversely correlated with usage of global routing muxes (details omitted)

      • If we reduce the routability of the crossbar, we will end up programming more global routing muxes to compensate for the entropy loss


Conceptual idea

Conceptual Idea

intra-cluster

crossbar

global routing


Theorem

Theorem

  • Let P and L be the number of muxes and switches in a crossbar

    • The entropy is at most Plg(L/P)

    • The entropy per switch is at most log(L/P) / (L/P)

    • These bounds are achieved only when each mux has size L/P and each configuration realizes a unique RRV

  • Proof omitted because I DO NOT HATE YOU!


What are we doing here

What are we doing here?

  • Lemieux and Lewis

    • Routability: Monte Carlo simulations

    • Area: Count switches

  • Feng and Kaptanoglu

    • Routability: Crossbar entropy

    • Area: Entropy per switch

    • Caveat: Focus only on crossbars where we can count routable, non-redundant RRVs!


Type 1 crossbar

Type-1 Crossbar

  • 1-level

    • L2 muxes are driven directly by crossbar input signals

    • #routable RRVs depends on L2 crossbar topology

  • Not area-efficient due to big L2 muxes

  • Xilinx Virtex-style


Type 2 crossbar

Type-2 Crossbar

  • 2-level

    • L1 is sparsely populated

    • L2 is fully populated

  • Fully populated L2 reduces area efficiency

  • VPR

    • Fc,indetermines L1 population density


Type 3 crossbar

Type-3 Crossbar

  • 2-level, Partitioned

    • L1 partition Pi only drives L2 partition Oi

    • From input m to LUT input n, all paths go through muxes in Pi and Oi exclusively

    • #Routable RRVs is the product of #Routable RRVs for each disjoint sub-crossbar


Proposed type 3 crossbar and generation algorithm

Proposed Type-3 Crossbar and Generation Algorithm

  • Each sub-crossbar is Type-2

  • Can count #routable RRVs (Details omitted)


Entropy vs switches

Entropy vs. # Switches


Entropy vs global routing mux usage

Entropy vs. Global Routing Mux Usage


The bottom line

The Bottom Line…

  • Who cares…

    • Theoretical properties are cute

    • Actel/Microsemi did not use these crossbars in their FPGAs

  • Practical observation…

    • The cheaper you make the intra-cluster routing crossbar, the more expensive the global routing…


A 65nm flash based fpga fabric optimized for low cost and power

A 65nm flash-based FPGA fabric optimized for low cost and power

Jonathan W. Greene, et al.

International Symposium on FPGAs, 2011

Note: Paper is from Microsemi

(Feng and Kaptanoglu are co-authors)


Corporate secrets divulged

Corporate Secrets Divulged

  • They used a Clos Network

    • Three parameters: m, n, r


Clos network properties

Clos Network Properties

  • Used when the physical circuit switching needs to exceed the capacity of the largest feasible single crossbar

  • Much cheaper than a fully populated nxn crossbar


Strict sense nonblocking clos network m 2n 1

Strict-sense Nonblocking Clos Network(m >2n – 1)

  • An unused input on an ingress switch can always be connected to an unused output on an egress switch, without reconfiguration!


Rearrangeably nonblocking clos network m n

RearrangeablyNonblocking Clos Network(m > n)

  • An unused input on an ingress switch can always be connected to an unused output on an egress switch, but reconfiguration may be necessary!


Recursive clos network design

Recursive Clos Network Design

  • Scalable to any ODD number of stages

    • Replace center crossbar with a 3-stage Clos Network


  • Login