1 / 35

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible. Michael Adler Elliott Fleming Michael Pellauer Joel Emer. Outline. Problem & goals Basic model structure Modeling a pipelined microarchitecture Modeling memory hierarchies Modeling multiprocessors

temima
Download Presentation

HAsim FPGA-Based Processor Models: Fast, Accurate and Flexible

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HAsim FPGA-Based Processor Models:Fast, Accurate and Flexible Michael AdlerElliott FlemingMichael PellauerJoel Emer

  2. Outline • Problem & goals • Basic model structure • Modeling a pipelined microarchitecture • Modeling memory hierarchies • Modeling multiprocessors • FPGA implementation details

  3. Standard Scaling Problem Slide • Single core targets: model performance scaled with processor speed • Multi-core targets: problem size grows with each generation • Solutions: • Reduce fidelity: • Shorter runs • Subset of available cores • Lightweight model • Structural simulator change: • Parallelize it • Find a new method

  4. Dependence Problems in Parallel Software Models • Option 1: Target CPUs ➞ Simulator Threads • Uncore causes dependence between simulator threads • High performance models (e.g. Graphite) relax the dependence Uncore Fetch Decode Execute Core 0 Core 1 Option 2: Target Pipeline Stages ➞ Simulator Threads • Lots of data movement • Cyclic pipelines impose complex dependence

  5. Why is Hardware Difficult to Model in Software? • Constant data movement through pipelines • Many points of dependence between “parallel” regions • Large, irregular, memory footprint • Difficult to vectorize • Branchy

  6. Software Model Compromises • Speed: Detailed model • Slow • Studies limited by run-time (e.g. large cache replacement policy) • Accuracy: Simplified model • Model writer makes decisions about fidelity, hoping not to affect predictions • Multi-core interactions remain difficult to parallelize • Find a new method?

  7. FPGAs • Shares the same properties as the target machine • Abundant wires • “ parallelism • “ registers • Obvious mapping of pipelines • Already ubiquitous for RTL verification • Fast Detailed FPGA models are often faster than simple models!

  8. Aggregate Simulator Throughput (Parsec Black-Scholes)

  9. Classification of FPGA-Based Designs

  10. Prototype • Final RTL, mapped to a different technology • E.g. an ASIC emulated on an FPGA • This is what most people imagine for FPGA-based models • Characteristics: • Useful for verification before producing final hardware • Shorter debugging loop • Internal state is more visible than final hardware • Masks are expensive • Too late to make big micro-architectural decisions • Often too large to fit on a single FPGA • Often too late or too slow to be useful for software development

  11. Functional Emulator • Model architectural semantics • No prediction of run-time • Characteristics: • Can be written faster than prototypes • Potentially more FPGA-area efficient • Use FPGA-friendly structures (e.g. no big CAMs) • Multiplex functional pipelines (like SMT) • Useful as a software development platform • Not useful for microarchitectural research

  12. Model • Project metrics of interest (e.g. timing, power, reliability) • Emulate functional behavior as needed to compute metrics Characteristics: • Metric may be computed algorithmically (even time) • An extension of functional emulators: function + metrics

  13. Model Terminology Modeling hardware on hardware leads to terminology confusion: • Both have caches, pipelines, memories… • Target machine means the microarchitecture being studied • FPGA, functional-model and timing-model all refer to implementation details. (E.g. functional memory cache is an FPGA structure.) • Host is the general purpose machine to which FPGAs are connected

  14. Why isn’t everyone building timing models with FPGAs?

  15. Fast, Accurate or Now? Accuracy Model Speed Development Time

  16. FPGA Picture is Different Accuracy Model Speed Development Time

  17. Reducing Development Time: Managing Complexity How do I: Use FPGAs while focusing on my algorithm? HAsim LEAP Model time? A-Ports Re-use components? Split functional / timing models AWB Fit a large problem on FPGAs? Multiplexing Latency Insensitivity Multiple FPGAs Development Time

  18. STDIO on General Purpose Machines FILE *f = fopen(path, “w”);const char *name = “Kenneth”;fprintf(f, “%s, what is the frequency?\n”, name);

  19. I/O In Hardware Description Languages (System Verilog) Integer f = fopen(path, “w”);string name = “Kenneth”;fwrite(f, “%s, what is the frequency?\n”, name);

  20. Nothing Comes from Nothing • FPGAs have: • No standard physical device • No standard device model • No standard system interface • No standard API

  21. What Makes Hardware General Purpose? • The software! • Compilers and library APIs make code “universal” • Hardware standards (ACPI, PCIe) make OS development and compiler writing easier. Little impact on user programs. • ISA matters if you want to avoid recompiling. ISA is part of the software API, along with standard libraries.

  22. LEAP Platform FPGA Software Timing Partition Fetch Decode Exe Software Services Control Control Emulate Memory State Streams Functional Partition Platform Interface Virtual Platform Virtual Platform Scratchpad Memory STDIO RRR RRR Remote Memory Channel Channel FPGA Physical Platform Software Physical Platform

  23. Hello World in LEAP module [CONNECTED_MODULE] mkConnectedApplication(); STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n"); Reg#(STATE) state <- mkReg(STATE_start); rule hello (state == STATE_start);stdio.printf(msg, List::nil); state <= STATE_finish;endrule endmodule

  24. Bluespec on One Foot • Functional language derived from Haskell • Generates Verilog • Modules – the analog of C++ classes • May be polymorphic (types are abstract) • Methods are the callable routines exposed by modules • Inlined statically at compile time into a calling rule • Rules are: • Executed atomically • Guarded (predicated) • Guard is both explicit (user specified) and implicit • Implicit guards come from guards on methods called in a rule

  25. Hello World in LEAP module [CONNECTED_MODULE] mkConnectedApplication(); STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n"); Reg#(STATE) state <- mkReg(STATE_start); rule hello (state == STATE_start);stdio.printf(msg, List::nil); state <= STATE_finish;endrule endmodule main()

  26. Hello World in LEAP module [CONNECTED_MODULE] mkConnectedApplication(); STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n"); Reg#(STATE) state <- mkReg(STATE_start); rule hello (state == STATE_start);stdio.printf(msg, List::nil); state <= STATE_finish;endrule endmodule Control Logic

  27. Hello World in LEAP module [CONNECTED_MODULE] mkConnectedApplication(); STDIO#(Bit#(32)) stdio <- mkStdIO(); let msg <- getGlobalStringUID("Hello, World!\n"); Reg#(STATE) state <- mkReg(STATE_start); rule hello (state == STATE_start);stdio.printf(msg, List::nil); state <= STATE_finish;endrule endmodule STDIO

  28. LEAP Gives FPGAs Key “General Purpose” Properties • Virtual Platform • I/O • Virtual memory abstract ion (scratchpads) • Topology • Named channels (FIFOs) instead of hard-coded wires • Host/FPGA remote procedure calls • Automated mapping to multiple FPGAs • Debugging Aids • Deadlock detection • Automated scan chains • User scan chains

  29. LEAP Platform Users • HAsim timing models • Prototypes • SSD Functional Model • AirBlue wireless network stack • Algorithmic accelerators • H.264 decoder • Matrix multiplication • …

  30. Key Concept: Latency Insensitivity

  31. Latency Insensitive Channel Semantics • Guaranteed: • FIFO • Accurate • Always allow at least one message to be in flight • Not guaranteed: • Latency Why? • Allows for replacement of algorithms – even to software • Permits use of hierarchical memories (caches) • Simplifies communication – especially off-chip • This is a common software strategy (pipes, TCP/IP, pthreadmutex)

  32. Named Channels • Name both endpoints of a FIFO • Software builds the connection • Replaces user’s hand-routed Verilog channels • Automatically route, even across FPGAs Common in software: • Named ports in software timing models • UUCP has been dead for a long time (for a reason)

  33. Finally, an Explanation of our Project’s Name LINC: Latency-Insensitive Named Channel LEAP: LINC-based Environment for Application Programming HAsim: Hardware-based micro-Architecture Simulator

  34. http://asim.csail.mit.edu/redmine

More Related