1 / 24

PATH: P age A ccess T racking H ardware to Improve Memory Management

PATH: P age A ccess T racking H ardware to Improve Memory Management. Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University of Toronto, Canada. Page Access Tracking Challenge. Storage Management Research Many sophisticated algorithms

grover
Download Presentation

PATH: P age A ccess T racking H ardware to Improve Memory Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University of Toronto, Canada

  2. Page Access Tracking Challenge • Storage Management Research • Many sophisticated algorithms • Most require accurate knowledge about memory access trace • Adopted mostly for file systems or databases • Not straightforward for virtual memory • Problem: Limited Page Access Tracking • Hard to measure either Reuse Distance or Temporal Locality • Conventional Access Tracking Mechanisms • Monitoring page faults • Most page accesses are missed. • Scanning Page Table bits • High scanning overhead => low scanning frequency

  3. Page Access Tracking Challenge(cont’d) • Access Tracking with Performance Counters • Statistical Data Sampling: • Favours only hot pages • Hard to track reuse distance or temporal locality • Recording TLB misses - High overhead • TLB’s are small (TLB miss is very frequent) • TLB miss handling is performance-critical • Hardware Approach [Zhou et al. ASPLOS’04] + Effective for its purpose (but inflexible) - Impractical hardware resource requirements • ~ 1 MB of hardware buffer per 1GB of physical memory! • Software Approach [Yang et al. OSDI’06] • Dividing pages into active and inactive sets • Page-protecting members of the inactive set - Overhead can still be too high

  4. 10% overhead even with a large active set and poor performance 90% overhead to get acceptable performance Page Access Tracking in Software Performance of adaptive page replacement for FFT vs. Runtime overhead of page access tracking in software

  5. Overflow Interrupt MISS Page Access Log Page Access Buffer VADDR Page Access Tracking Hardware (PATH) VADDR LOOKUP MISS CPU Core TLB Page Tables • Advantages • Extra hardware resources required are small (around 10KB) • Off the common path • Scalable (does not grow with physical memory)

  6. Information Provided by PATH VADDR LOOKUP MISS CPU Core TLB Page Tables • Raw Form • Abstraction: Precise LRU Stack • Abstraction: Miss Rate Curve (MRC) Overflow Interrupt Page Access Buffer VADDR MISS Page Access Log

  7. Basic Abstraction: LRU Stack • Accessed and updated for each entry on the Page Access Log • Implementation: • Lookup: • Page Table-like Structure • O(1) lookup time • Update • Doubly linked list • A few pointers are updated for each page access

  8. Basic Abstraction: Miss Rate Curve (MRC) • Basic Info: • The number of misses for a given memory size in period of time. • Basic Use: • Estimating the “memory needs” of an application.

  9. MRU Distance LRU Distance Computing MRC Online • Mattson’s Stack Algorithm • For LRU: • Memory Sizes < LRU Distance: miss • Memory Sizes >= LRU Distance: hit Page Access

  10. Overflow Interrupt MISS Page Access Log Runtime Overhead Tradeoff VADDR LOOKUP MISS CPU Core TLB Page Tables • The larger the Page Access Buffer (active set) • The more page accesses are filtered + The less run-time overhead - The less accurate page access trace Page Access Buffer VADDR

  11. Runtime Overhead, Example: FFT Active Set Entries

  12. Runtime Overhead, Example: LU-non. Active Set Entries

  13. Runtime Overhead • Summary • Overall, a 2K Entry Page Access Buffer seems to be the best point in the tradeoff between performance and runtime overhead. • PATH’s overhead is less than 6% across a wide variety of applications. • PATH’s overhead is negligible in most cases.

  14. Case 1: Adaptive Page Replacement • Region-based Page Replacement • Use different replacement policies for different regions in the virtual address space • Rationale: each region is likely to contain a data structure with a fairly stable access pattern • Low Inter-Reference Set (LIRS) • Handles sequential and looping patterns • Requires tracking page accesses • Originally developed for file system caching • Easily enabled by the PATH-generated information

  15. Region-based Replacement • Using MRC for comparison:

  16. Region-based Replacement (cont’d) • Dividing Memory among Regions • Minimize total miss rate by giving memory to the regions that have more “benefit-per-page”.

  17. Simulation Results LU-contiguous (SPLASH2)

  18. Simulation Results BT (NAS Benchmark)

  19. Case 2: Prefetching • Spatial Locality-based • Prefetch pages spatially-adjacent to the faulted page. • Advantages • Simple and easy to implement • Effective for many cases • Major drawback • Oblivious to non-spatial access patterns • Temporal Locality-based • Prefetch pages that are regularly accessed together. • Use PATH to track temporal locality of pages.

  20. Temporal Locality-based Prefetching • Page Proximity Graph (PPG) • Each page is a node • There exists an edge from p to q if q is regularly accessed shortly after p (temporal locality) • PPG Update: • Add a page q to p’s proximity set if q appears in the LRU stack in close proximity to p repeatedly . • Basic prefetching scheme: • Breadth-First traversal starting from the faulted page.

  21. Prefetching LU non-contiguous (SPLASH2)

  22. Conclusions • Page Access Tracking Hardware • Small (10KBytes in size) • Low-overhead • Generic • Cases Studied • Adaptive Page Replacement • Process Memory Allocation (See Paper) • Prefetching • Significant performance improvement can be achieved by tracking page accesses.

  23. Future Directions • Other case studies • NUMA page placement • Super-page management • Per-thread page access tracking • Augmenting page accesses with thread info • Multiprocessor issues • Combining traces collected on multiple CPUs

  24. Questions

More Related