1 / 17

High-Level Simulator Architecture Design of simulator mirrors design of VIRAM-1 chip

The Mark-II Performance Simulator for VIRAM-1 Gagan Prakash, Brian Gaeke CS 252 Spring 2001 http:// www-inst.eecs.berkeley.edu/~brg/vsimII brg@eecs.berkeley.edu gagpcool@hkn.eecs.berkeley.edu.

janina
Download Presentation

High-Level Simulator Architecture Design of simulator mirrors design of VIRAM-1 chip

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Mark-II Performance Simulator for VIRAM-1Gagan Prakash, Brian GaekeCS 252 Spring 2001http://www-inst.eecs.berkeley.edu/~brg/vsimIIbrg@eecs.berkeley.edugagpcool@hkn.eecs.berkeley.edu

  2. Problems in Performance Simulation(Simulation)Runtime Does Matter! Current performance simulator => 1500X slowdown Many problems now assume "normal" VIRAM-1 chip configWhen simulator was designed, "normal" chip not known Lots of parameters no longer neededMany computation-intensive datasets cannot be simulated Software architecture of current simulator non-portableCurrent simulator no longer maintained Out of date with respect to simpler (functional) simulator Can't use today's machines (180 MHz fastest simulation machine)

  3. Solutions for Simulation Trace-based cycle-level simulatorTraces from actively-maintained functional simulator No more version skew! Emphasis on portability Faster simulation machines => faster simulations Streamline parametrizationMake it look like the "normal" VIRAM-1 chip Restrict parametrization Potential pitfall: traces are hugeSupport compressed traces

  4. High-Level Simulator ArchitectureDesign of simulator mirrors design of VIRAM-1 chip Software Units Lexer & Parser Performance Analyzer Control Unit and queue Issue Unit and queue Functional Units

  5. Low-Level Simulator ArchitectureFunctional Units (FUs) Memory Functional Unit Memory system Translation system Flag Functional Unit Arithmetic Functional Units: 1 Int+FP, 1 Int only Element group queues

  6. Wall clock time to simulate

  7. Peak simulator memory usage Measured resident size using ps (pages actually touched)

  8. Predicted cycle countMeasuring inner loops only Percent Difference : Update: 13% Transitive: 5% Pointer: 17%

  9. Project SuccessesUseful parametrizations! Lanes, banks/subbanks, memory size Reduced simulator memory size Lots of simple optimizations Don't simulate empty queues Retire no-ops early Reduced implementation complexity 7,500 LOC vs 117,000 LOC

  10. Project "Not-So-Successes"Cycle-level simulationMemory FU resolves hazards per-element-group Element groups from many instructions in any cycle Interlocks between memory unit and other FUs Control/issue unit simulations basically trivialTrace size Small traces range from 50 - 250 megs Simulator spends 70 - 95% of time in I/OMemory system: Implementor information starvation!Memory bandwidth numbers are unavailable TLB undocumentedScalar core????

  11. Conclusions and Future WorkProgram dependent average analysis Multiple idealized modelsEach with a queue model and a few typical kernels Could enable multicycle simulation You need a general simulator to enable this, thoughCut the fat out of the old simulator Port it to other platforms?Exception modeling untouched "We still don't have an OS" Software-managed TLB effects unknownIs this simulation really better? (Hennessy)

  12. What We Learned Leverage Existing Work First!Why rewrite when you can port, extend, or document…? Need extremely detailed docs to write simulator A good simulator can be documentation… Need access to random notes, not just theses Emphasize leaving behind good docs when you graduate? Devising good approximations for complex HW is a black art But… approximations are indispensible Trading off accuracy vs. complexity Experiment with compilers and standard librariesPortability and efficiency

  13. Backup Slides

  14. Why We Ditched Multicycle Simulation1. Finding register file structural hazards requires per-cycle Suppose full pipeline... Every cycle, some FU is doing a reg read Could cause structural hazard w/ first memory unit stage 2. Memory unit must be synched with other FUs Memory unit controls other units' stalls To figure out whether other units can go ahead… Need all the details of memory unit state per pipeline stage 3. Added overhead of multicycle  Amdahl's Law Simplify implementation by always assuming single cycle

  15. Compiler EffectsFallacy: The compiler that understands the language better produces the faster code. Stepanov Abstraction Penalty Benchmark Measures speedup of C++ library algos/data abstractions versus naive (FORTRAN-like) hand coded loops On same floating point vector kernel You pay 2.3x in runtime for using a smarter compiler

  16. Library EffectsPitfall: Relying on standard library for programmer efficiency. Surprises in profile for early version: Lib calls (string) and object constructors??? When you are dealing with 200MB traces you want to be I/O bound. Workaround: Don't use objects Make everything extern "C" {...} Use C strcpy instead of C++ string::assignResult: Time in I/O reduced from 95% to 70%

  17. Ideal Simulator Construction Experience User selectable multiple levels of detail Having a detailed understanding of processor first Access to documentation, notes Information about design decisions A better mix of C and C++ Well defined input format (parsing traces is Evil) Component framework for simulator construction Standardize interface between pieces: RTL, coarse-grained cycle, q'ing theory,memory interface, hand hacked... custom RTL Queue model Queue model

More Related