1 / 47

Efficient Metadata Management for Irregular Data Prefetching

Efficient Metadata Management for Irregular Data Prefetching. Hao Wu , Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, Calvin Lin. Regular Prefetching. Some programs access memory sequentially e.g. MPEG player Regular prefetchers are effective and widely used e.g. Best offset prefetcher.

jafari
Download Presentation

Efficient Metadata Management for Irregular Data Prefetching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Metadata Managementfor Irregular Data Prefetching Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, Calvin Lin

  2. Regular Prefetching • Some programs access memory sequentially • e.g. MPEG player • Regular prefetchers are effective and widely used • e.g. Best offset prefetcher D G F A B C E

  3. The Problem: Irregular Accesses • Common in many programs • ~30% performance opportunity for irregular SPEC2006 benchmarks D C E A X B Y

  4. Temporal Prefetchers • Memorize correlations • Replay memorized accesses D C E A X B Y

  5. Temporal Prefetchers • High metadata overhead (10~20 MB) • Too large to fit on-chip • Metadata stored off-chip • Problematic! Cache Metadata Traffic Demand Accesses DRAM Metadata

  6. Irregular Stream Buffer (ISB) [MICRO’13] • Introduced an on-chip metadata cache • Metadata cache synchronized with TLB ~4× overhead Cache Metadata Demand Accesses DRAM Metadata

  7. Our Solution: Managed ISB (MISB) • A new metadata management scheme • Decouples metadata management from TLB • Prefetches metadata Cache Metadata Metadata Prefetcher Demand Accesses DRAM Metadata

  8. Background: ISB • Assign a structural address for each access in a stream • Convert irregular access streams to sequential streams D C E A X B Y Metadata

  9. Background: ISB • Prefetch the next address in structural address space D C E A X B Y Metadata

  10. Background: ISB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata

  11. Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata

  12. Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata

  13. Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata

  14. Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata

  15. Background: ISB TLB ISB’s metadata cache is synchronized with the TLB Cache On-Chip Metadata Metadata Demand Accesses DRAM Metadata Off-Chip Metadata

  16. Deficiencies of ISB On-Chip Metadata TLB Demand Accesses On-Chip Metadata Size Required = TLB Size * Cache Lines Per Page

  17. Deficiencies of ISB On-Chip Metadata Size Required = TLB Size * Cache Lines Per Page • Metadata is managed at coarse granularity • ~90% traffic is useless due to lack of spatial locality • Metadata size is proportional to page size • ISB does not scale to large pages • Metadata size is proportional to TLB size • ISB does not work for two-level TLBs

  18. Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses

  19. Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses

  20. On-Chip Metadata MISB Operation A=? Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata

  21. On-Chip Metadata MISB Operation Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses A=71 DRAM Metadata

  22. On-Chip Metadata MISB Operation A=71 Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata

  23. Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses

  24. On-Chip Metadata MISB Operation A=71 Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata

  25. On-Chip Metadata MISB Operation Cache 72=?, 73=? Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata

  26. On-Chip Metadata MISB Operation Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses 72=X, 73=B DRAM Metadata

  27. On-Chip Metadata MISB Operation 72=X, 73=B Cache Metadata Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata

  28. Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses

  29. On-Chip Metadata MISB Operation M=? Cache Metadata Off-Chip Metadata Metadata Prefetcher Useless Traffic! Demand Accesses DRAM Metadata

  30. On-Chip Metadata MISB Operation M=? Cache Metadata Bloom Filter Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata

  31. On-Chip Metadata MISB Operation M= × Cache Metadata Bloom Filter Off-Chip Metadata Metadata Prefetcher Demand Accesses DRAM Metadata

  32. Components of MISB • Manage metadata at a fine granularity • Prefetch metadata • Filter out unnecessary accesses

  33. Evaluation Methodology • Industrial Simulator • ARMv8 AArch64 • OoO Core • 2-level TLB • Bandwidth: 32GB/s • Multicore – ChampSim • Similar trends as the industrial simulator • SPEC2006 • Irregular Subset • CloudSuite

  34. Evaluated Prefetchers • Global correlation MISB STMS & Domino ISB • PC localization • PC localization

  35. Global vs. PC-Localization while ( ! end ) { read tree->next; if (condition) read linked_list->next; } F Ba1Aa2D C E a3 …. Global F B A D C E …. PC localization a1 a2 a3 …. • PC-localization: Segregate the global stream by the load instruction’s PC • PC-localized streams are more predictable! F a1 a2 a3 B G A C D E

  36. Evaluated Prefetchers • Global correlation • Metadata not cacheable Idealized STMS & Domino MISB ISB • PC localization • Metadata cacheable • Prefetches metadata • PC localization • Metadata cacheable • Syncs metadata with TLB

  37. Performance

  38. Performance

  39. Performance

  40. Traffic Overhead

  41. Traffic Overhead 1316%

  42. Traffic Overhead 1316% 70%

  43. Performance - SPEC2006 on 8 Cores

  44. Performance –CloudSuite on 4 cores

  45. Traffic –CloudSuite on 4 Cores

  46. Conclusions • MISB manages metadata effectively • Uses fine grained metadata caching • Introduces a metadata prefetcher • Empirical results • 70% traffic overhead vs. 342% for STMS • 23% speedup vs. 10% for idealized STMS • MISB makes temporal prefetching practical Scan QR Code for More Info

  47. Thank you! Scan QR Code for More Info

More Related