Download
reactive numa n.
Skip this Video
Loading SlideShow in 5 Seconds..
Reactive NUMA PowerPoint Presentation
Download Presentation
Reactive NUMA

Reactive NUMA

175 Views Download Presentation
Download Presentation

Reactive NUMA

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Reactive NUMA A Design for Unifying S-COMA and CC-NUMA Babak Falsafi and David A. Wood University of Wisconsin

  2. Some Terminology • NUMA • Non Uniform Memory Access • CC-NUMA • Cache Coherent NUMA • COMA • Cache Only Memory Architecture • S-COMA • Simple COMA

  3. SMP Clusters • Approach for large-scale shared memory parallel machines • Directory based cache coherence • RAD responsible for remote memory access

  4. CC-NUMA • First processor causes page fault • OS Maps Virtual Address to Global Physical address • RAD snoops memory bus • Block Cache • Remote request

  5. CC-NUMA • References global addresses directly • Remote cluster cache • Only holds remote data • Another level in cache hierarchy • Block cache is small • Sensitive to data allocation and placement • Good for scientific workloads

  6. S-COMA • First access causes page fault • OS initializes page table, RAD translation table and access control tags • Hits serviced by local memory • Misses detected by RAD • Inhibit memory • Request data

  7. S-COMA • Remote data in memory or cache • Allocated/Mapped at page granularity • S-COMA • OS handles allocation and migration • Large memory and cache • Fully associative • Large page size • Requires large granularity spatial locality • Possible Thrashing

  8. R-NUMA • Combine S-COMA and CC-NUMA • Map CC-NUMA pages to Global PA • Map S-COMA pages to Local PA • Often requires no additional hardware • Distinguish 2 types of pages • Reuse pages Data used frequently on the same node • Communication pages Data exchange between nodes

  9. Switching Mechanism • Reuse pages • Capacity and Conflict Misses • S-COMA • Communication pages • Coherence Misses • CC-NUMA • Detect refetches of evicted blocks • Trivial for read-only blocks in non-notifying protocol (still shared) • Additional hardware required for read-write-blocks • Count refetches on per-node, per-page basis

  10. R-NUMA Figure

  11. Qualitative Performance • Analysis of worst case behavior • Performance depends on S-COMA resp. CC-NUMA overhead • Realistically R-NUMA no more than 3 times worse than vanilla CC-NUMA or S-COME • In practice “bound” is much smaller

  12. Quantitative Results

  13. Conclusions • Dynamically react to program behavior • Exploit best caching strategy • Per Page basis • Worst case performance is bound • Quantitative Results indicate • R-NUMA usually no worse than best of CC-NUMA and S-COMA • If worse, still way better than worst case • Never worse than both • Less sensitive to relocation threshold or overhead than S-COMA • Less sensitive to cache size than CC-NUMA

  14. Questions • Sounds like a free lunch • Does R-NUMA really require no additional hardware? • Dynamically switching always good in research papers • What about the practice?