1 / 15

Integration of Retiming with Architectural Floorplanning: A New Design Methodology for DSM

Integration of Retiming with Architectural Floorplanning: A New Design Methodology for DSM. Abdallah and Bassam Tabbara Profs: R.K.Brayton, A.R.Newton, and K.Keutzer The NexSIS Project. Issues in DSM. timing at the module level not an issue timing at the chip level is an issue

noahh
Download Presentation

Integration of Retiming with Architectural Floorplanning: A New Design Methodology for DSM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integration of Retiming with Architectural Floorplanning:A New Design Methodology for DSM Abdallah and Bassam Tabbara Profs: R.K.Brayton, A.R.Newton, and K.Keutzer The NexSIS Project

  2. Issues in DSM • timing at the module level not an issue • timing at the chip level is an issue • bigger raw capacity that can be used by: • replication • reuse • NTRS Projections: • 1997: • .25u 4M tr/cm2 600 pins 6 layers 3cm2 • 2006: • .1u 40M tr/cm2 1500 pins 8 layers 10cm2

  3. Problem Description • one-level hierarchy of design • minimum number of levels to support reuse • mid way between flat and two level • placement and wireplanning of: • 200-2000 modules, average size: 50k gates • dynamic range of modules sizes: 1-500k gates • types of modules: hard, firm, soft • hard: layout • firm: gates + aspect ratio • soft: RTL • large number of nets: 40k-100k • pins per module: 10-100

  4. Problem Statement We address chip level assembly of predesigned IP blocks, each under 100k gates in size, either as hard or soft macros, optimizing for performance, power and area (emphasis in that order).

  5. Goals • develop a tool that has an impact in DSM by supporting IP reuse • handle IP blocks that have constraints and should be combined to result in a certain functionality. User design constraints include: • delay • power dissipation • area • generate a final layout within 12-24 hours (overnight) • complexity of algorithms within O(m2) -O(m3) m = design complexity • final result should: • be within 5-10% of human design (may not be able to compare) • meet the user constraints if possible or make design suggestions

  6. Challenges • size issues: • bigger block sizes, aspect ratios and relative sizes • number of pins, nets much bigger than blocks • placement issues: • special design for memories? • partitioning hard, clustering easy • routing issues: • no channels, point to point • busses • many metal layers to be assigned • timing at the chip level

  7. Conventional Flows • integration of various steps and tools: • Logic Synthesis - Physical Design • Global - Detailed • separation of concerns: • front end - back end • no contract • separation entails hundreds of iterations: • number of iterations can be proportional to complexity of design

  8. Conventional Flow Architecture

  9. New Design Flow • minimize design iteration: • planning at the early stages of the flow • support incremental changes • need for a proof of convergence • introduce retiming into the architectural floorplanning stage • better handle on timing issues • path-based vs. net-based

  10. Design Flow Architecture

  11. Functional Decomposition • provides an entry point for reused IPs • RTL may already be well characterized • area-delay trade-off as an important performance characteristic • result is: • a set of blocks • some area-delay trade-off estimates

  12. Retiming • takes in lower bound constraints • creates upper bound constraints • reduces area of modules whenever possible • can be made refinable and incremental • depends on granularity of the representation • path-based

  13. Placement / Routing • initial placement/routing step • can be a min-cut or any constructive approach • has to be fast • gives lower bounds on delays between modules • placement/routing: • takes in upper bounds from retiming as flexibility on placement • replaces modules resulting in better lower bound constraints • objective is to reduce total chip area • delay is reduced indirectly

  14. Logic Synthesis • assumption: • problems can be solved at the module level • predictable for given size modules • can be run in parallel for the different modules • provides better estimates of area-delay trade-offs for subsequent iterations

  15. Iterations • loop between placement and retiming • until no further improvements are possible • may iterate many times • very similar to: • initial min-cut partitioning • low temperature simulated annealing • have to prove some convergence criteria • loop between floorplanning/wireplanning and layout • only a few iterations • each iteration information is retained through area-delay trade-offs • also proof of convergence

More Related