1 / 16

A Fully Buffered Memory System Simulator

A Fully Buffered Memory System Simulator. FBsim 1.0. Rami Nasr -M.S. Thesis, and ENEE 759H Course Project Thursday May 12 th , 2005. Another Simulator?. Sim-DRAM exists and supports FB-DIMM. Why write another simulator? .

xuan
Download Presentation

A Fully Buffered Memory System Simulator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Fully Buffered Memory System Simulator FBsim 1.0 Rami Nasr -M.S. Thesis, and ENEE 759H Course Project Thursday May 12th, 2005

  2. Another Simulator? Sim-DRAM exists and supports FB-DIMM. Why write another simulator? • Sim-DRAM still had a few unworkable bugs in its FB-DIMM model when I began my study. • FB-DIMM is radically different than other memory architectures. New simulator => fresh start. • FBsim is made exclusively for simulating and studying the FB-DIMM architecture. Easier to study FB-DIMM with an exclusive simulator. • Different scheduler, mapping algorithm, approach, style, section of study in the FB-DIMM design space. • FBsim is ideal for simulating ‘unreasonably’ high memory request rates and studying channel saturation effects. • The two simulators can be used to validate each other’s results in FB-DIMM studies. • Writing a memory simulator was a great experience for me.

  3. FBsim Overview • All code written from scratch. • Standalone product. Does not currently interface with CPU simulators or memory traces. Instead probabilistically models memory transactions according to user specifications. • => Does not actually store memory data • Written in ANSI C. ~5000 lines of code. Code organized into header files, commented, quite easy to hack. • Fast. For each memory channel, 1 second simulates ~10ms (or ~1ms during channel saturation) on a 2.4 GHz Pentium 4. • Supports Open & Closed Page Mode, Fixed & Variable Latency Mode. • Supports output of macro and micro (frame by frame) simulation data • Does not model channel init, maintenance, sync. overhead. • Does not model memory refresh. • Does not model power consumption, and power timing limitations (tFAW etc.). • The above options can be incorporated readily into future versions.

  4. FBsim Overview 2 Channel Scheduler 0 Channel Scheduler 1 Input Transaction Generator Address Mapper • A Frame Iteration • Try to generate transactions • Map any generated transactions to its channel scheduler. • Fire each scheduler once. Channel Scheduler 7

  5. Input Transaction Model • Step Distributions • Normal (Gaussian) Distributions

  6. Bus Trace Viewer FBsim Model Input Transaction Model 2

  7. Closed Page Mode Open Page Mode Address Mapping • Physical address must be mapped somehow to the right channel, DIMM, rank, bank, row, and column. • FBsim built to support different DIMM capacities, different channel capacities, even unbalanced configurations • => Algorithm needed to map incoming transaction to DIMM WHILE (a non zero row sum exists) { WHILE (visit each channel with a non zero row sum exactly once) { The next 'result' is channel DIMM with the highest number. Decrement that DIMM's number by 1. Decrement the row sum by 1. } } Modulus = 4+2+1+2 = 9

  8. Channel Scheduler

  9. FB-DIMM Frame Format Review • SouthBound (SB) Frame could be a: • Channel Frame (not modeled in FBsim) • Command Frame (up to three DRAM commands, with only one command possible to each DIMM in the channel) • Command + Wdata Frame (holds one DRAM command, plus one DDR beat of write data) • NorthBound (NB) Frame could be a: • Channel Frame (not modeled in FBsim) • Read Response Frame (holds two DDR beats of returned read data)

  10. 1x8 achieved 7.9 GBps before saturating (82%) • 2x4 achieved 15.6 GBps (82%) • 4x2 achieved 31.3 GBps (82%) • 8x1 achieved 45.2 GBps (59%!) Some of my Results • Case Study Conclusion • With at least two DIMMs on each channel, performance scales very well in FB-DIMM • More than two DIMMs only increases capacity, not throughput • Adding each DIMM adds ~5ns average channel latency in FLM, and slightly over half that in VLM • In closed page mode, only 82% of peak theoretical throughput of a channel can be reached.

  11. Some of my Results 2 • In Closed Page Mode with 2:1 read/write ratio, a reordering window of size ~12 transactions achieves best possible performance (channel saturation) for a FB-DIMM channel scheduler. Increasing window-size over this has no benefit. • The more skewed the read/write ratio, the bigger the scheduling window needs to be (at 4:1, its ~18). • In Variable Latency Mode, a reordering window of size ~20 achieves best possible performance.

  12. Some of my Results 3 Micro-study shows that in Closed Page Mode, the FB channel can at most reach ~93% write data utilization on the SB, and ~84% read data utilization on the NB. Micro-study showed that FBsim channel utilization was slightly worse for non 2:1 read/write ratios (it was 2% worse for 4:1). FBsim scheduler can quite straightforwardly be made more adaptive to read/write ratio of transactions in scheduler.

  13. Future Ideas with FBsim • I’m graduating this semester (if Dr Jacob and Mr (Dr?) Wang so please), and escaping to the corporate world. • => Writing a guide for FBsim along with some ideas for future work. Anyone who wishes to take over development is eagerly encouraged to. • If so, I would be happy to help get things rolling by email or in person. Feel free to access & use anything in FBsim or my thesis paper. • I strongly believe a very interesting paper or three can quite quickly come out of this research area (me)

  14. Future Ideas with FBsim 2 • For credibility in a paper, add an interface between FBsim and a CPU simulator or memory traces. Run real benchmarks through FBsim. Compare and contrast these results with the transaction modeling results. • AND/OR add more functionality and provable realism to the transaction modeler. Study this. • Best yet, integrate FBsim into the Sim-DRAM package as an added option. • Add modeling for channel overhead, memory refresh overhead, error simulation and error handling, power consumption constraints and metrics. • Enhance adaptivity of FBsim scheduler to non 2:1 read/write ratios. • Experiment with address mapping algorithm and load balancing. • Experiment with different type scheduler implementations (eg. ones not based on pattern matching). *involved* • Study hardware constraints in FB-DIMM channel scheduling.

  15. More Possible FB-DIMM Studies • Channel utilization and configuration trade-offs for Open Page Mode • Performance degradation of shrinking scheduler reorder window size • Relaxation on critical DRAM device parameters (density, nBanks, timing constraints, clock frequency) allowed by FB-DIMM architecture • OR optimizing the FB-DIMM architecture by increasing the SB and NB channel widths (adding lines) or bitrates, and maybe modifying the frame protocol • AMB is a logic device on a memory module!! Can add buffers, arithmetic units, processing power, etc…..

  16. Special Thanks to.. • Dr Jacob for introducing me to the field and guiding my progress • David Wang for the course lectures and material

More Related