1 / 40

DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA

DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08. DRAM Background. Typical Memory. Busses: address, command, data, DIMM (Dual In-Line Memory Module) selection . DRAM cell. DRAM array.

issac
Download Presentation

DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DRAM background • Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08

  2. DRAM Background

  3. Typical Memory • Busses: address, command, data, DIMM (Dual In-Line Memory Module) selection

  4. DRAM cell

  5. DRAM array

  6. DRAM device or chip

  7. Command/data movement: DRAM chip

  8. protocol, timing Operations(commands)‏

  9. Examples of DRAM operations(commands)‏

  10. “The purpose of a row access command is to move data from the DRAMarrays to the sense amplifiers.” • tRCD and tRAS

  11. “ A column read command moves data from the array of sense amplifiers of a given bank to the memory controller.” • tCAS, tBurst

  12. Precharge: separate phase that is a prerequisite for the subsequent phases of a row access operation (bitlines set to Vcc/2 or Vcc)‏

  13. Organization, access, protocols

  14. Logical Channels: set of physical channels connected to the same memory controller

  15. Examples of Logical Channels

  16. Rank = set of banks

  17. Row = DRAM page

  18. Width: aggregating DRAM chips

  19. Scheduling: banks

  20. Scheduling banks

  21. Scheduling: ranks

  22. Open x Close page • Open-page: data access to and from cells requires separate row and column commands • Favors accesses on the same row (sense aps open)‏ • Typical general purpose computers (desktop/laptop)‏ • Close-page: • Intense amount of requests, favors random accesses • Large multiprocessor/multicore systems

  23. Available Parallelism in DRAM System Organization • Channel: • Pros: performance • different logical channels, independent memory controllers • schedulling strategies • cons • Number of pins, power to deliver • Smart but not adaptive firmware

  24. Available Parallelism in DRAM System Organization • Rank • pros • accesses can proceed in parallel in different ranks (busses availability)‏ • cons • Rank-to-rank switching penalties in high frequency • Globally synchronous DRAM (global clock)‏

  25. Available Parallelism in DRAM System Organization • Bank • Different banks (busses availability)‏ • Row • Only 1 row/bank can be active at any time period • Column • Depends on management (close-page / open-page)‏

  26. Paper: Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07

  27. parallel bus scaling: frequency, widths, length, depth (man hops => latency )‏ #memory controllers increased CPUs, GPUs #DIMMs/channel (depth) decreases 4DIMMs/channel in DDRs 2 DIMMs/channel in DDR2 1 DIMM/channel in DDR3 scheduling Issues

  28. Contributions • Applied DDR based memory controller policies in FBDIMM memory • Evaluation of Performance • Exploit FBDIMM depth: rank (DIMM) parallelism • latency and bandwidth for FBDIMM and DDR • high utilization of the channels, FBDIMM • 7% in latency • 10% • low utilization of the channels • 25% in latency • 10 % in bandwidth

  29. Northbound channel: reads / Southbound-channel: writes • AMB: pass-through switch, buffer, serial/parallel converter

  30. Methodology • DRAMsim simulator • Execution-driven simulator • Detailed models of FBDIMM and DDR2 based on real standard configurations • Standalone / coupled with M5/SS/Sesc • Benchmarks: bandwidth-bound • SVM from Bio-Parallel (r:90%)‏ • SPEC-mixed: 16 independent (r:w = 2:1)‏ • UA from NAS (r:w = 3:2)‏ • ART (SPEC-2000, OpenMP) (r:w = 2:1)‏

  31. Methodology: cont • Different scheduling policies: greedy, OBF, most/last pending and RIFF • 16-way CMP, 8MB L2 • Multi-threaded traces gathered with CMP$im • SPEC traces using Simplescalar with 1MB L2, in-order core • 1 rank/DIMM

  32. High-bandwidth utilization: • Better bandwidth: FBDIMM • Larger latency

  33. ART and UA: latency reduction

  34. Low utilization: serialization cost • Depth: FBDIMM scheduler offsets serialization

  35. Overhead: queue, south and rank availability • Single-rank: higher latency

  36. Scheduling • Best: RIFF, priority on reads than writes

  37. Bandwidth is less sensitive th Higher latency in open-page mode • More channels => decreases channel utilization

More Related