1 / 8

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access. Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis Zhen Fang ‡† Ramesh Illikkal * Ravi Iyer *. University of Utah , NVidia ‡ and Intel Labs* † Work done while at Intel.

clio
Download Presentation

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access NiladrishChatterjee ManjunathShevgoor Rajeev Balasubramonian Al Davis Zhen Fang‡† Ramesh Illikkal* Ravi Iyer* University of Utah , NVidia‡ and Intel Labs* †Work done while at Intel

  2. Memory Bottleneck • DRAM power as high as 25% of total datacenter power • Low-Power DRAM in place of DDR3. • BOOM from HP Labs • Energy Proportional Memory from Stanford Low Power Memory BASELINE DDR3 LPDDR CPU CPU LPDDR LPDDR DDR3 DDR3 LPDDR DDR3

  3. Latency Wall • Memory latency wall not going away • Emerging scale-out workloads e.g. Cloudsuite • Move towards energy-efficient in-order cores • Reduced Latency DRAM offers very low latency • Row-cycle time (tRC) of 8-12ns (DDR3 tRC = 48.75ns, LPDDR2 tRC = 60ns)

  4. No one memory works best

  5. Heterogeneous Memory • Combine high-performance and low-power dram to outperform DDR3 at a lower energy cost • Large number of possible designs • Different DRAM device combinations • Channel Organization • Data Placement Granularity CPU PERFORMANCE OPTIMIZED DRAM POWER OPTIMIZED DRAM

  6. Critical Word Regularity • Most DRAM requests are for word-0 of the cache-line Frequency of accesses to individual words of a cache-line

  7. Critical Word Acceleration Word 0 RLDRAM Words 1 - 7 CPU LPDDR LPDDR RLDRAM • Critical Word fetched from RLDRAM to boost performance • Rest of the cache-line placed & retrieved from LPDRAM for energy efficiency.

  8. Results • Throughput improved by 12.9% • System energy improved by 6%

More Related