1 / 20

RAMP Infrastructure

RAMP Infrastructure. Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007. RAMP: An infrastructure to build simulators using FPGAs. CPU. CPU. CPU. CPU. Target Model. Interconnect Network. DRAM. Host Platform. Run Target Model on Host Platform. Hard Work.

Download Presentation

RAMP Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

  2. RAMP: An infrastructure to build simulators using FPGAs

  3. CPU CPU CPU CPU Target Model Interconnect Network DRAM Host Platform Run Target Model on Host Platform Hard Work

  4. Reduce, Reuse, Recycle • Reduce effort to build target models • Users just build components, infrastructure handles connections (The RDL Compiler) • Reuse components by having good abstractions • Across different target models • Across different host platforms • XUP, Calinx, BEE2, BEE3, also Altera (see Greg) • Recycle existing IP for use as simulation models • Commercial processor RTL is its own model

  5. Unit A Unit B Unit C Pipeline Channel FIFO Channel RAMP Target Models Units • Relatively large chunks of functionality • e.g., processor + L1 cache • User-written in some HDL or software Channels • Point-point, undirectional, two kinds: • FIFO channel: Flow-controlled interface • Pipeline channel: Simple shift register, bits drop off end • Generated by RAMP infrastructure

  6. Target FIFO Channel Parameters • Need buffering of at least (Forward+Reverse) latency to get full bandwidth over link • RAMP infrastructure instantiates channel with desired parameters D D Datawidth RDY ENQ RDY Buffering DEQ Forward Latency Reverse Latency

  7. Target Pipeline Channel Parameters • Only recommended for expert use in target models • (Should use FIFO channels and latency-insensitive protocols in target design) D D Datawidth Forward Latency

  8. Unit A Unit B Unit C RAMP Description Language (RDL) Target: [ Greg Gibeling, UCB ] • User describes target model topology, channel parameters, and (manual) mapping to host platform FPGAs using RDL • RDL Compiler (RDLC) generates configurations Generated links carry channels RDLC Host: Unit B Generated Unit Wrappers Unit A Unit C FPGA2 FPGA1

  9. Virtual Target Clock

  10. Virtualized RTL Improves FPGA Resource Usage • RAMP allows units to run at varying target-host clock ratios to optimize area and overall performance • Example 1: Multiported register file • Example, Sun Niagara has 3 read ports and 2 write ports to 6KB of register storage • If RTL mapped directly, requires 48K flip-flops • Slow cycle time, large area • If mapping into block RAMs (one read+one write per cycle), takes 3 host cycles and 3x2KB block RAMs • Faster cycle time (~3X) and far less resources • Example 2: Large L2/L3 caches • Current FPGAs only have ~1MB of on-chip SRAM • Use on-chip SRAM to build cache of active piece of L2/L3 cache, stall target cycle if access misses and fetch data from off-chip DRAM

  11. Start/Done Timing Interface Wrapper • Wrapper generated by RDL asserts “Start” on the physical FPGA cycle when the inputs to the unit are ready for the next target cycle • Unit asserts “Done” when it finishes the target cycle and its outputs are ready • Unit can take variable amount of time • Unvirtualized RTL unit can connect “Done” to “Start” (but must not clock until “Start”) Start In1 Unit Out In2 Done

  12. Distributed Timing Models

  13. Pipeline target channel implemented as distributed FIFO with at least L buffers Host: RDYs Start Start RDY Unit A Unit B D D ENQ DEQ Done Done DEQs Distributed Timing Example Unit B Unit A D Target: Latency L

  14. Timing Target FIFO Channel • Can build timed credit-based flow control (CBFC) FIFO inside Target model, using pipeline channels for communicating data forwards and credits backwards • But this puts two CBFCs in series (one in target unit, one hidden in host implementation of pipeline channels) • RDL can generate a unified FIFO that merges both of these behind the FIFO interface Target: Latency L D D D D Credit control RDY RDY ENQ DEQ Credits

  15. Other Automatically Generated Networks • Control network has workstation as master and every unit as slave device • Memory-mapped interface with block transfers • Used for initialization, stats gathering, debugging, and monitoring • Units can connect to DRAM resources outside of timed target channels • Used to support emulation and virtualization state • Units can communicate with each other outside of timed target channels • Support arbitrary communication. E.g., for distributed stats gathering

  16. Wide Variety of RAMP Simulators

  17. Simulator Design Choices • Structural Analog versus Highly Virtualized • Functional-only versus Functional+Timing • Timing via (virtual) RTL design versus separate functional and timing models • Hybrid software/hardware simulators We’re trying to build layers of abstractions that are useful to all types of simulator Also, trying to make modules in different styles inter-operate

  18. Effective Abstractions Hide Details

  19. …But Provide Inter-Operability

  20. Work in Progress: Stay Tuned

More Related