1 / 22

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors. Zhichun Zhu Zhao Zhang ECE Department ECE Department Univ. Illinois at Chicago Iowa State Univ. DRAM Memory Optimizations. Optimizations at DRAM side can make a big difference on single-threaded processors

cody
Download Presentation

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun Zhu Zhao Zhang ECE Department ECE Department Univ. Illinois at Chicago Iowa State Univ.

  2. DRAM Memory Optimizations Optimizations at DRAM side can make a big difference on single-threaded processors • Enhancement of chip interface/interconnect • Access scheduling [Hong et al. HPCA’99, Mathew et al. HPCA’00, Rixner et al. ISCA’00] • DRAM-side locality [Cuppu et al. ISCA’99, ISCA’01, Zhang et al., MICRO’00, Lin et al. HPCA’01] HPCA-11

  3. How does SMT Impact Memory Hierarchy? • Less performance loss per cache miss to DRAM memories – Lower benefit from DRAM-side optimizations? • But more cache misses due to cache contention – Much more pressure on main memory • Is DRAM memory design more important or not? HPCA-11

  4. Outline • Motivation • Memory optimization techniques • Thread-aware memory access scheduling • Outstanding request-based • Resource occupancy-based • Methodology • Memory performance analysis on SMT systems • Effectiveness of single-thread techniques • Effectiveness of thread-aware schemes • Conclusion HPCA-11

  5. Memory Optimization Techniques • Page modes • Open page: good for programs with good locality • Close page: good for programs with poor locality • Mapping schemes • Exploitation of concurrency (multiple channels, chips, banks) • Row buffer conflicts • Memory access scheduling • Reorder of concurrent accesses • Reducing average latency and improving bandwidth utilization HPCA-11

  6. Memory Access Scheduling for Single-Threaded Systems • Hit-first • A row buffer hit has a higher priority than a row buffer miss • Read-first • A read has a higher priority than a write • Age-based • An older request has a higher priority than a new one • Criticality-based • A critical request has a higher priority than a non-critical one HPCA-11

  7. Memory Access Concurrency with Multithreaded Processors Processor Memory Single-threaded Multi-threaded HPCA-11

  8. Thread-Aware Memory Scheduling • New dimension in memory scheduling for SMT systems: considering the current state of each thread • States related to memory accesses • Number of outstanding requests • Number of processor resources occupied HPCA-11

  9. time HA1 HA2 HB1 HA3 HB2 HA4 HB1 HB2 HA1 HA2 HA3 HA4 Outstanding Request-Based Scheme • Request-based • A request generated by a thread with fewer pending requests has a higher priority HPCA-11

  10. time HA1 HA2 MB1 HA3 MB2 HA4 HA1 HA2 HA3 HA4 MB1 MB2 Outstanding Request-Based Scheme • Request-based • Hit-first and read-first are applied on top • For SMT processors, sustained memory bandwidth is more important than the latency of an individual access HPCA-11

  11. Resource Occupancy-Based Scheme • ROB-based • Higher priority to requests from threads holding more ROB entries • IQ-based • Higher priority to requests from threads holding more IQ entries • Hit-first and read-first are applied on top HPCA-11

  12. Methodology • Simulator • SMT extension of sim-Alpha • Event-driven memory simulator (DDR SDRAM and Direct Rambus DRAM) • Workload • Mixture of SPEC 2000 applications • 2-, 4-, 8-thread workload • “ILP”, “MIX”, and “MEM” workload mixes HPCA-11

  13. Simulation Parameters HPCA-11

  14. Workload Mixes HPCA-11

  15. Performance Loss Due to Memory Access HPCA-11

  16. Memory Access Concurrency HPCA-11

  17. Memory Channel Configurations HPCA-11

  18. Memory Channel Configurations HPCA-11

  19. Mapping Schemes HPCA-11

  20. Memory Access Concurrency HPCA-11

  21. Thread-Aware Schemes HPCA-11

  22. Conclusion DRAM optimizations have significant impacts on the performance of SMT (and likely CMP) processors • Mostly effective when a workload mix includes some memory-intensive programs • Performance is sensitive to memory channel organizations • DRAM-side locality is harder to explore due to contention • Thread-aware access scheduling schemes does bring good performance HPCA-11

More Related