1 / 25

Combining Statistical and Symbolic Simulation

Combining Statistical and Symbolic Simulation. Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis. Overview. HLS is a hybrid performance simulation Statistical + Symbolic Fast Accurate Flexible. Motivation. I-cache hit rate.

Download Presentation

Combining Statistical and Symbolic Simulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer ScienceUniversity of California at Davis

  2. Overview • HLS is a hybrid performance simulation • Statistical + Symbolic • Fast • Accurate • Flexible

  3. Motivation I-cache hit rate Basic block size Dispatch bandwidth I-cache miss penalty Branch miss-predict penalty

  4. Motivation • Fast simulation • seconds instead of hours or days • Ideally is interactive • Abstract simulation • simulate performance of unknown designs • application characteristics not applications

  5. Outline • Simulation technologies and HLS • From applications to profiles • Validation • Examples • Issues • Conclusion

  6. Design Flow with HLS Cycle-by- Cycle Simulation Estimate Performance Profile Possible solution Design Issue HLS Design Issue Design Issue

  7. Traditional Simulation Techniques • Cycle-by-cycle (Simplescalar, SimOS,etc.) + accurate – slow • Native emulation/basic block models (Atom, Pixie) + fast, complex applications – useful to a point (no low-level modifications)

  8. Statistical / Symbolic Execution • HLS + fast (near interactive) + accurate / – within regions + permits variation of low-level parameters + arbitrary design points / – use carefully

  9. L1 I-cache Main Memory L2 Cache Branch Predictor Fetch Unit Out of order Dispatch Unit Out of order Execution core Out of order Completion Unit L1 D-cache HLS: A Superscalar Statistical and Symbolic Simulator Statistical Symbolic

  10. Workflow Code sim-stat Binary sim-outorder machine-profile app profile R10k Stat-binary machine-configuration HLS

  11. Machine Configurations • Number of Functional units (I,F,[L,S],B) • Functional unit pipeline depths • Fetch, Dispatch and completion bandwidths • Memory access latencies • Mis-speculation penalties

  12. Profiles • Machine profile: • cache hit rates => () • branch prediction accuracy => () • Application profile: • basic block size => (,) • instruction mix (% of I,F,L,S,B) • dynamic instruction distance (histogram)

  13. Statistical Binary • 100 basic blocks • Correlated: • random instruction mix • random assignment of dynamic instruction distance • random distribution of cache and branch behaviors

  14. Statistical Binary dynamic instruction distance branch predictor behavior load (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dependence 0) integer (l1 i-cache, l2 i-cache, dependence 0, dependence 1) integer (l1 i-cache, l2 i-cache, dependence 0, dependence 1) branch (l1 i-cache, l2 i-cache, branch-predictor accr., dep 0, dep 1) store (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dep 0, dep 1) load (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dependence 0) core functional unit requirements cache behavior during I-fetch cache behavior during data access

  15. integer (...) branch (...) store (...) load (...) integer (...) branch (...) load (...) integer (..) HLS Instruction Fetch Stage Fetches symbolic instructions and interacts with a statistical memory system and branch predictor model. Similar to conventional instruction fetch: - has a PC- has a fetch window- interacts with caches- utilizes branch predictor- passes instructions to dispatch Differences: - caches and branch predictor are statistical models

  16. Validation - SimpleScalar vs. HLS

  17. Validation - R10k vs. HLS

  18. HLS Multi-value Validation with SimpleScalar HLS Simple-Scalar (Perl)

  19. HLS Multi-Value Validation with SimpleScalar HLS Simple-Scalar (Xlisp)

  20. Example use of HLS An intuitive result: branch prediction accuracy becomes less important (crosses fewer iso-IPC contour lines, as basic block size increase). (Perl)

  21. Example use of HLS Another intuitive result: gains in IPC due to basic block size are front-loaded Trade-off between front-end (fetch/dispatch) and back-end (ILP) processor performance (Perl)

  22. Example use of HLS This space intentionally left blank. (Perl)

  23. Related work • R. Carl and J.E. Smith. Modeling superscalar processors via statistical simulation - PAID Workshop - June 1998. • N. Jouppi. The non-uniform distribution of instruction-level and machine parallelism and its effect on performance. - IEEE Trans. 1989. • D. Noonburg and John Shen. Theoretical modeling of superscalar processor performance - MICRO27 - November 1994.

  24. Questions & Future Directions • How important are different well-performing benchmarks anyway? • easily summarized • summaries are not precise => yet precise enough • Will the statistical+symbolic technique work for poorly behaved applications? • Will it extend to deeper pipelines and more real processors (i.e. Alpha, P6 architecture)?

  25. Conclusion • HLS: Statistical + Symbolic Execution • Intuitive design space exploration • Fast • Accurate • Flexible • Validated against cycle-by-cycle and R10k • Future work: deeper pipelines, more hardware validations, additional domains • source code at: http://arch.cs.ucdavis.edu/~oskin

More Related