Efficient Virtual Memory Handling Techniques and Paging Systems

Virtual Memory and Paging J. Nelson Amaral

Large Data Sets • Size of address space: • 32-bit machines: 232 = 4 GB • 64-bit machines: 264 = a huge number • Size of main memory: • approaching 4 GB • How to handle: • Applications whose data set is larger than the main memory size? • Sets of applications that together need more space than the memory size? Baer, p. 60

Multiprogramming • More than one program reside in memory at the same time • I/O is slow: • If the running program needs I/O, it relinquishes the CPU Baer, p. 60

Multiprogramming Challenges • How and where to load a program to memory? • How a program asks for more memory? • How to protect one program from another? Baer, p. 60

Virtual Memory • Solution: • Give each program the illusion that it could address the whole addressing space • CPU works with virtual addresses • Memory works with real or physical addresses Baer, p. 60

Virtual -> PhysicalAddress Translation • Paging System • Divide both the virtual and the physical address spaces into pages of the same size. • Virtual space: page • Physical space: frame • Fully associative mapping between pages and frames. • any page can be stored in any frame Baer, p. 60

Paging System Virtual space is much larger than physical memory Memory can be shared with little fragmentation Pages can be shared among programs Memory does not need to store the whole program and its data at the same time Baer, p. 61

Address Translation valid bit = 0 implies a page fault (there is no frame in memory for this page) Baer, p. 62

Page Fault • Exception generated in program P1 because valid bit = 0 in Page Table Entry (PTE) • Page fault handler initiates I/O read for P1 • I/O read takes several miliseconds to complete • context switch occurs • O.S. saves processor state and starts I/O operation • Handles CPU control to another program P2 • Restores P2’s state into CPU Baer, p. 62

Address Translation Virtual and physical addresses can be of different sizes. Example: 64 bits 40 or 48 bits Baer, p. 62

Translation Look-Aside Buffer (TLB) • Problem: • Storing page table entries (PTEs) in memory would require a load for each address translation. • Caching PTEs interferes with the flow of instructions or data into the cache • Solution: TLB, a small, high-associativity, cache dedicated to cache PTEs Baer, p. 62

TLB organization • Each TLB entry consists of: • tag • data (a PTE entry) • valid bit • dirty bit • bits to encode memory protection • bits to encode recency of access • A set of TLB entries may be reserved to the Operating System Baer, p. 62

TLB Characteristics Baer, p. 63

Large Pages • Recent processors implement large page size (typically 4 MB pages) • reduces page faults in applications with lots of data (scientific and graph) • requires that TLB entries be reserved for large pages. Baer, p. 63

Referencing Memory Baer, p. 63

Memory Reference Process TLB hit? No Handle TLB miss Yes 0 valid bit? Page Fault 1 protection violation? Yes Access Violation Exception No Yes store? Turn PTE dirty bit on No Update Recency Baer, p. 63

Handling TLB Misses • Must access page table in memory • entirely in hardware • entirely in software • combination of both • Replacement Algorithms • LRU for 4-way associativity (Intel) • Not Most Recently Used for full associativity (Alpha) Baer, p. 64

Handling TLB Miss (cont.) • Serving a TLB miss takes 100-1000 cycles. • Too short to justify a context switch • Long enough to have significant impact on performance • even a small TLB miss rate affects CPI Baer, p. 64

OS handling of page fault Reserve frame from a free list Find page to replace if there is no free frame Find if faulting page is in disk Invalidate cache lines mapping to replaced page Invalidate portions of the TLB (maybe Cache) Write dirty replaced pages to the disk Initiate read for faulting page Baer, p. 64

When page arrives in memory I/O interruption is raised OS updates the PTE of the page OS schedule requesting process for execution Baer, p. 64

Invalidating TLB Entries on Context Switch • Page Fault → Exception → Context Switch • Let: • PR: Relinquishing process • PI: Incoming Process • Problem: TLB entries are for PR, not PI • Invalidating entire TLB on context switch leads to many TLB misses when PI is restored • Solution: Use a processor ID number (PID) Baer, p. 64

Process ID (PID) Number • O.S. sets a PID for each program • The PID is added to the tag in the TLB entries • A PID Register stores the PID of the active process • Match PID Register with PID in TLB entry • No need to invalidate TLB entries on context switch • PIDs are recycled by the OS Baer, p. 64

Page Size X Read/Write Time Seek Time Rotation Time Transfer Time 0 to 10 ms ~ 3 ms Page of Size x Seek Time Rotation Time Transfer Time 0 to 10 ms ~ 3 ms Page of Size 2x • Amortizing I/O Time: • Large page size • Read/write consecutive pages Baer, p. 65

Large Pages • Amortize I/O time to transfer pages • Smaller Page Tables • More PTEs are in main memory • lower probability of double page fault for a single memory reference • Fewer TLB misses • Single TLB entry translates more locations • Pages cannot be too large • Transfer time and fragmentation Baer, p. 65

Performance of Memory Hierarchy Baer, p. 66

When to bring a missing item (to cache, TLB, or memory)? • On demand Baer, p. 66

Where to put the missing item? • Cache: restrictive mapping (direct or low associativity) • TLB: fully associative or high set associativity • Paging System: general mapping Baer, p. 66

How do we know it is there? • Cache: Compare tags and check valid bits • TLB: Compare tags, PID, check valid bits • Memory: Check Page Tables Baer, p. 67

What happens on a replacement? • Caches and TLBs: (approximation to) LRU • Paging Systems: • Sophisticated algorithms to keep page fault rate very low • O.S. policies allocate a number of page to each program according to working set Baer, p. 67

Simulating Memory Hierarchy • Memory Hierarchy simulation is faster than simulation to assess IPC or execution time • Stack property of some replacement algorithms: • for a sequence of memory references for a given memory location at a given level of the hierarchy, the number of misses is monotonically non increasing with the size of the memory • can simulate a range of sizes in a single simulation pass. Baer, p. 67

Belady’s Algorithm • Belady’s algorithm: replace the entry that will be accessed the furthest in the future. • It is the optimal algorithm • It needs to know the future • not realizable in practice • useful in simulation to compare with practical algorithms Baer, p. 67

Efficient Virtual Memory Handling Techniques and Paging Systems