1 / 54

CS137: Electronic Design Automation

CS137: Electronic Design Automation. Day 2: January 6, 2006 Spatial Routing. Today. Idea Challenges Path Selection Victimization Allocation Methodology Quality, Timing Parallelism Mesh FPGA Implementation. CS137a: Day22. Global/Detail. With limited switching ( e.g. FPGA)

tia
Download Presentation

CS137: Electronic Design Automation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS137:Electronic Design Automation Day 2: January 6, 2006 Spatial Routing

  2. Today • Idea • Challenges • Path Selection • Victimization • Allocation • Methodology • Quality, Timing • Parallelism • Mesh • FPGA Implementation

  3. CS137a: Day22 Global/Detail • With limited switching (e.g. FPGA) • can represent routing graph exactly

  4. Pathfinder Review • Key step: find-shortest path from src to sink • Mark links by usage • Used links cost most • Shortest path tries to avoid • Negotiated Congestion w/ History • Increase cost of congested nodes • Adaptive cost … makes historically congest nodes expensive, try to avoid

  5. Slow? • Why is routing slow? • Each route: • search all possible paths from source to sink • Number of paths expands as distance2 • Graph of network is MBs large • Large complicated data structure to walk • Won’t all fit in cache • Number of nets = Number of edges • Perform many iterations to converge

  6. Parallelism? • Search all paths in parallel for a single route • Search routes for multiple nets in parallel • Don’t overlap • Overlap?

  7. Initial Key Ideas • Augment existing static network structure to route itself • Use hardware to exploit parallelism in routing • Search all paths in parallel • Route multiple nets in parallel • Avoid walking irregular graph • Specialized/pipelined hardware at each switch • Hardware can perform a route trial in 10s of cycles vs. 10K-100K cycles for software

  8. 2 4 Hardware Route Search in Action

  9. Path Search Hardware

  10. Idea Existing paths already allocated Drive a one into search paths All free paths pass up Path Search Hardware

  11. Challenges • How select among paths? • What if there are no free paths? • Can we work without Pathfinder’s history? • How handle fanout? • How handle allocation and victimization?

  12. Select Among Paths? • Easy: Randomly • Use PRNG at xover switchbox • Otherwise, need to represent costs…

  13. No Paths? • Try stealing a path (rip-up)  victimize existing path • Which one? • Randomly select victim • History-free Pathfinder suggest: • one with least nets shared with other routes  CountCost • CountNet: one which intersects least existing nets

  14. CountNet vs. CountCost • CountCost: 6 • CountNetCost: 1

  15. Implement Counting? Idea: Delay congested signal Free paths not delayed. Least congested signal arrives at xover first.

  16. CountNet Approximation • Keeping track of which net uses a switch would be much more state/complicated • Approximate CountNet by only delaying at conflicting switches

  17. Implement CounNet Approximation Allow to pass if agrees with switch setting.

  18. Cost is max of sides • Also note: • Actual cost is max(srcxover,sinkxover) instead of sum

  19. Algorithm Comparison – Random Netlist Total Channels HSRA Array Size

  20. How Improve? • Apologize for lack of history? • Exploit fast • Try multiple starts and exploit randomness • Like multiple starts of FM

  21. Trading Routing Time for Quality

  22. Choosing the Right Victims

  23. CountNet CountNet  best of 20 starts.

  24. Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths.

  25. Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths.

  26. Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths.

  27. Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths. • Add a state bit at every switch • Set when allocate during the current net search. • Clear when we begin to route a new net • Order the destinations associated with a single source • For each destination, • Search from sink as before (only from sink) • At the switch, if the state bit is set and the sink side is congestion free, we have found an available path. • Otherwise, drive ones into all available source paths and allocate a new path, like a standard route search.

  28. Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths.

  29. Hypergraphs (Fanout) • Sequentially route each two-point net, trying to re-use as much as possible from existing allocated paths. • Add a state bit at every switch • Set when allocate during the current net search. • Clear when we begin to route a new net • Order the destinations associated with a single source • For each destination, • Search from sink as before (only from sink) • At the switch, if the state bit is set and the sink side is congestion free, we have found an available path. • Otherwise, drive ones into all available source paths and allocate a new path, like a standard route search.

  30. High Fanout Nets • Victimizing high fanout net will cause considerable re-route work • Might want to penalize victimizing high fanout nets • CountNetFanout? • Requires more state…expensive… • Simple hack: lock high fanout nets against victimization • What’s a high fanout net? >10?

  31. Toronto20 - Quality

  32. So far • All Quality • …haven’t dealt with all performance details • Had basis for confidence in performance • Wanted to make sure worthwhile first

  33. Hardware Allocation Add all nets to R While nets in R > 0 and routeTrial < RTmax For each unrouted net Find all possible routes If found possible routes Randomly select and allocate a route Else Select a route to victimize and allocate the route Endfor Adjust R Endwhile Idea: send one down selected path

  34. With Victimization Add all nets to R While nets in R > 0 and routeTrial < RTmax For each unrouted net Find all possible routes If found possible routes Randomly select and allocate a route Else Randomly select a route to victimize and allocate the route Endfor Adjust R Endwhile

  35. Analysis Methodology • Sequential version that does effectively the same thing (perhaps inefficiently) • Count key operations/variables • Number of net searches • Number of victims • Timing model for key operations • Calculate Performance under various timing assumptions

  36. Timing Models • Hardware Timing • Tpath = length of path ~= log(N) • Tallocate~=Tpath • Tvictim~=4*Tpath • Software Timing • Tallocate~=Npathsw*(Tm+Tc+Twb+Ta) • Tvictim~=Npathsw*(Tm+Tc)+V*Talloc • Tm=main memory ref • Tc=cache ref; Twb=write buffer; Ta=bit alloc

  37. Route Time Ntry – number of route starts NRT – number of path searches NRO – number of rip ups NFO – number of fanout searches NFOA – number of fanout allocations

  38. Raw Data

  39. There is a quality/time tradeoffs Want to compare at iso-quality Making comparisons

  40. More Parallelism • Only exploiting parallelism in path search • Subtrees are independent • Route root • Then route next two channels in parallel • Then route next 4…

  41. Still Not Exploiting • Multiple path searches in parallel that overlap routing resources…

  42. Extension to Mesh Networks • No well defined crossover point . • Path back to the source is not implied directly by the topology of the routing network. • Paths of different length • and non-minimal length paths may be important components of a good solution.

  43. Mesh Approach • Single-ended search from source • Larger delay on congestion  allow non-minimal length paths • Breadcrumb approach  leave state in switches pointing back to source

  44. Extension to Mesh Networks

  45. Extension to Mesh Networks - Results (Simulator too slow to run larger)

  46. BFT FPGA Implementation • 21 4-LUTs to implement switch logic +9 4-LUTs to manage prng/allocation =30 4-LUTs/T-switch • 13/3 switches/PE/domain • 130 4-LUTs/PE/domain • C=10 • 1300 4-LUTs / PE

  47. Mesh FPGA Implementation

More Related