Chapter 12 - Virtual Memory Systems

Chapter 12 - Virtual Memory Systems • Page Replacement Algorithms • Main metric is number of page faults / algorithm, given a constant memory stream. • Will look at global replacement first (page frames allocated to all processes are candidates for replacement). • All global page algorithms attempt to predict the future page requirements of the process.

Page Replacement Algorithms • Optimal (aka MIN or OPT): Replace the page that is to be used the farthest in the future. • Not an actual algorithm, since it requires future knowledge. • Useful benchmark, though, for testing other algorithms. • Provides a lower bound on page faults. • Example of it’s use later. • Random: Replace a random page. • Does not attempt to predict future behavior based on the past. • Easy to implement :) • Not the worst, but not the best. Consider it a lower bound on page replacement algorithms. • Example of it’s use later (Figure 12.1).

Page Replacement Algorithms • FIFO: Replace the page that has been in memory the longest. • Theory is that oldest page won’t be used in near future. • Bad theory :) (Figure 12.2). • Least-Recently Used (LRU): Replace the page that has not been accessed in the longest time. • Predict the recent future from the recent past. • Good theory & best global paging algorithm (30 - 40% of Optimal). • Expensive to record timestamps, though (Figure 12.4). • Easier to approximate LRU via software & hardware.

LRU Approximations • A useful hardware device is a per-page reference bit that is set to 1 by the paging hardware whenever the page is accessed. It can be read and set to zero by software. • Crude LRU: Use the reference bit to provide you with one bit of resolution - can use it to split page references into those that have been accessed at least once from those that have not been accessed at all (Figure 12.5). • LRU page counter: Keep a, say, 8-bit counter that is incremented by an LRU daemon every second or so. If the ref bit == 1, increment & clear the ref bit. This will separate the pages into 256 sets (Figure 12.5).

LRU Approximations • True LRU: Each page has a timestamp value that will sort the page access times. • Clock Page Replacement Algorithms • Visualize as a circular list of page frames, with a single clock hand sweeping across. • FIFO Clock algorithm: Clock hand always points to the page to replace; after replacement the hand moves on to the next page frame. This is the FIFO algorithm (and thus not interesting).

Clock Page Replacement Algorithms • Basic Clock algorithm: • Hand looks at reference bit of current page frame. If ref bit == 0, this page hasn’t been referenced and it is selected to be replaced. • If ref bit == 1, this page has been referenced. Give it a another chance by setting the ref bit to 0 and continue looking for a page frame where the ref bit == 0. Notice that this insures eventually a free page will be found when the clock hand rolls back around. • A page has to get referenced at least once every revolution of the clock hand to keep from getting replaced. • This is a modified FIFO algorithm (called First In, Not Used, First Out -- FINUFO or Not Recently Used -- NRU). • One clock alg had to look an average of 13 page frames in order to find one to replace.

Clock Page Replacement Algorithms • Hopefully the hardware not only supports the ref bit, but also a modified or dirty bit - a bit that is set indicating a value has been written somewhere inside of the page frame (a memory write occurred). • This gives us two bits we can use to “sort” the page references. A page with a dirty bit set is, in general, more expensive to handle (the page has to be written out to the swap device before a new page can be read into the page frame) -- it will take twice as long to process. • We can modify the basic clock algorithm to take the ref bit and dirty bit into account (Figure 12.7).

Clock Page Replacement Algorithms • Clock algorithm with ref/dirty bits (aka Second Chance). • Test the page frame pointed to by the clock hand • If ref == 0 && dirty == 0, select this page for replacement; • If ref == 1, set ref = 0 and continue moving clock hand; • If ref == 0 && dirty == 1, start writing the page to disk and set the dirty bit == 0 and continue moving clock hand • If page frame fails test (it finds a 0,0 page frame), replace this page. • If it passes the test, • Modify the page frame information according to rules above. • Move the clock hand to the next page frame. • Go back to the test step above. • Idea is to replace the “cheapest, oldest” page frame.

Clock Page Replacement Algorithms • In summary, Variant Test Modification FIFO always fail none Basic Clock ref == 1? ref=0 Second Chance (ref == 1 || If ref==1 then ref=0 mod == 1)? else mod=0 • Book quote: Clock algorithms are versatile and effective. • Examples of Page Replacement Algorithms • Typical program may contain millions of memory references. • We will use small memory references with higher than normal page fault rates (to demo the algs).

Examples of Page Replacement Algorithms • Reference stream is (column headers on page 493-494): 1,2,3,2,5,6,3,4,6,3,7,3,1,5,3,6,3,4,2,4,3,4,5,1 • Assume four page frames available (row numbered 0 through 3 on pages 493-494). • An “*” following a number indicates a page fault. • A “+” following an number indicates a reference for a page that is already in the page table. • Results (not counting the 1st four page faults): • Optimal: 7 page faults • LRU: 10 • Clock: 10 • FIFO: 12 • Random: 12

Local Page Replacement Algorithms • Global algorithms make no attempt to model the behavior of an individual process. • Better page replacement can be found using more accurate models of individual process execution. • Currently accepted model: Phase model of Working Sets. • Working set: set of pages that a process is currently using. If a process has only the working set of pages mapped to page frames in it’s page table then it won’t page fault. • The working set is defined in terms of past page references.

Working Set Model • Virtual time: each page reference is one unit of virtual time. • Time units are numbered from 1 (the oldest page reference) to T (the present). • Reference string: sequence of pages r1, r2, r3, .., rT, where rt is the page reference at virtual time t. • Let Θ = working set window: number of page references in which to look for a working set. • Working set is defined as: • W(T, Θ) = {p| p = rt, where T - Θ < t  T} • In other words, the working set is the set of pages referenced within the working set window.

Working Set Model • Idea is that if you can keep the number of allocated page frames for a process to be at least equal to the working set of that process it will never page fault once the working set is “filled”. • The operating system should thus only run the process when all the pages in it’s working set are loaded into memory. • The working set will change in time as the process proceeds. • Studies show that the working set transitions or phases tend to happen abruptly. • Measurements show: 95-98% of virtual time of a process is spent in a stable working set phase.

Working Set Model • Thus, only 2 to 5% of virtual time is spent between working sets. These transitions account for 40-50% of the total page faults of the process. • Figure 12.8 shows how the working set sizes fluctuate between the program phases. • Resident set: set of pages that a process currently has in memory. If the resident set == working set, then page faults should be minimal. If the resident set > working set, excessive page faults will occur. If the resident set < working set the result is inefficient use of allocate page frames. • Resident sets vary as program runs. It makes sense to make sure the working set tracks the resident set.

Working Set Paging Algorithm • Monitor the working set of for each process by computing W (the working set size). • Change the resident set as W changes. • If W will not fit in memory, swap out the entire process until it’s working set will fit. • The WS paging algorithm is a good one, but expensive to implement. Easier to approximate (same as with LRU and LRU approximations). • Working set algorithm requires a virtual time reference, which is easy to keep track of in the dispatcher at context switch time. We can use the accumulated execution time as the virtual time of the process.

Approximating the Working Set • The working set approximation algorithm: • When a page fault occurs, scan through all the pages that process has in memory. • If the ref bit is set, set the “time of last use” of that page to the current virtual time (accum. CPU time) of the process. • If the ref bit is NOT set, check the “time of last use” for the page. If the page has not been used within Θ (working set window) time units then remove the page. • Bring in the new page and add it to the resident set of the program. • Another variation, which we won’t cover, is the WSClock paging algorithm. It is more efficient than the above approximation algorithm. • Skip section 12.5.

Thrashing & Load Control • Thrashing: Overhead is higher than throughput per time unit. For paging performance, this means that page fault processing is taking more time than processes are getting over a period of time. • Classic thrashing behavior seen by old page fault algorithm that allowed in more processes when the user CPU consumption rate decreased. This would lead to more paging, more overhead, less user CPU consumption. A positive feedback loop resulted and the machine spent more time paging than allowing user processes to run. • This leads to the need for better load control (Figure 12.11).

Thrashing & Load Control • A better metric for VM performance is the page fault rate. If the page fault rate increases, decrease the level of multiprogramming by swapping out processes. If the page fault rate decreases, increase the level of multiprogramming by swapping in processes. • Swapping: the entire process address space is moved to the paging disk. A swapped-out process has no page frames assigned to it and cannot be scheduled to the run state. • Swapping involves higher degrees of process control and is usually scheduled separately.

Load Control • The short-term scheduler schedules processes that are in memory (or at least partially in memory). • The medium-term scheduler schedules processes that are out of memory (swapped out). • Some operating systems have a long-term scheduler that schedule processes/jobs to run over a period of hours or days (batch queuing system). • Figure 12.12 shows the first two levels. • The short-term and medium-term schedulers must coordinate their activities to keep processes flowing freely between memory and the swap device.

Load Control • The page replacement algorithms can provide useful load metrics: • Clock algorithm - if the clock hand rate is too slow, bring in more processes; if the clock hand rate is too fast, swap out processes. • Page fault frequency load control - track the page faults/second. Interesting choice: the “L==S” criterion. Keep the mean time between page faults (L) the same as the time it takes to handle a page fault (S). If L > S, swap processes out; if L < S, swap processes in (methinks the book has this backwards!). • Most UNIX systems revert to swapping when the page fault rates get high. Separate daemons (pageout on xi; kswapd on Linux) perform the swapping.

Load Control • In some cases it’s useful to be proactive with load control (predictive) rather than reactive. • One mechanism separates out the paging activity of a loading process (LT: Load Time) from a running process (RT: Run Time). The algorithm can then treat the initial process loading differently than a running process, avoiding misleading spikes in the page fault rate that may inadvertently decreased multiprogramming.

Load Control • Another idea is to prepage in parts of a process that are expected to be used right away. • Rather than using a demand paging scheme, where pages are brought in as they are used, prepaging of pages from a swapped-out process can eliminate the initial flurry of page faults as the resident set re-establishes itself. • Large Page Tables • Assume a 32-bit address space. • Assume 4 KB pages. This partitions the 32 bits into 20 bits for a page number and 12 bits for the offset. • Assuming a PTE takes 4 bytes, this would require a 4 MB page table per process (220 * 4)!

Large Page Tables • One solution: two-level paging. • Chop the page address into two pages: a master page and a secondary page (example: 10 bits for each; 12 bits for offset). • Let master page table contain not PTEs, but memory pointers to secondary page tables, which do contain PTEs. • Assuming the process address requirements are sparse, not all the secondary page tables will be needed, resulting in significant memory savings for page tables. • Figures 12.13 and 12.14 show two views of two-level paging. • While the master page table should be in memory, why not page out the secondary tables? • Notice, though, the cost of an extra memory access; assume that the TLB will alleviate the 2-level costs.

Large Page Tables • Another solution: three-level paging. • Can further subdivide the page address into three levels! • Figure 12.15 • If the level of TLB misses is low enough we can avoid having hardware support for two and three level paging and use software page table lookups instead (Figure 12.16). Table on page 513 shows relative costs of TLB hit/miss/page fault for 1/2/3/software paging. • Yet another solution: inverted page tables. • Rather than have a page table entry for all possible virtual address pages why not have a page table entry for all physical page frames? After all, all processes cannot have more than the actual amount of real memory resident.

Large Page Tables • Inverted page tables • An inverted page table has one entry for each page frame in physical memory and is used to look up which virtual page is in the page frame. • Need some way to map from a virtual page number into the inverted page table, since the IPT is sorted by physical page frame number. • One mechanism is to use a hash table (Figure 12.18). • Another is to include the process ID in the inverted page table so we know which page frame belongs to which process. • Skip sections 12.8, 12.9 & 12.10.

Segmentation • Segment: division of the logical (virtual) address space that is visible to the programmer. • An alternative to fixed-sized pages, segmentation allows for memory allocation to occur at “natural” boundaries, since the segment sizes are variable. You can allocate segments based on the sizes of the objects in the load module. • Segmented addressing works similar to paged addressing, with the obvious name change (“segment table” rather than “page table”, etc.) and the need for a base/limit pair within each STE (Segment Table Entry). • Figure 12.22 shows segmentation addressing.

Segmentation • Segment tables can have the same fields as a page table (ref bit, dirty bit, access bits, etc.). • Notice that finding a free segment of the proper size required by a process brings up all the horrible external fragmentation problems seen in dynamic memory allocation. • Some operating systems combine segmentation with paging: the virtual address is first sent to a segment table. The segment base address then points to a page table rather than to a memory address. So, segments are composed of same-sized pages. This avoids the dynamic memory allocation problem.

Segmentation • Note that in the “segmentation” vs “paging” war, paging won :) • Paging is transparent to the user, while segmentation requires that the compiler writers know how segments are assigned. • Segmentation, as previously mentioned, re-introduces the dynamic memory allocation hassles. • Interesting book comment: only commercial systems using segmentation use paged segments. • Book also says, rightfully so, that you have to be aware of the potential misuse of the word “segment”; it is used to mean different things. Find out the context.

Sharing Memory • Page tables and segment tables provide an extra bonus -- the ability to share memory between processes. • We saw how use of shared libraries cut down the memory requirements of processes by the use of reentrant library code. • A simple adjustment to a page table: have the frame number duplicated in more than one process’s page table (Figure 12.23). Now each process can execute code or access data that is executable/accessible from another process. • Shared memory management requires new syscalls.

Sharing Memory • New SOS calls: void *AttachSharedMemory(int smid, void *smaddr, int flags) void *DetachSharedMemory(void *smaddr) • UNIX calls: shmget(), shmctl(), shmop(). • Virtual Memory System Examples • Swap area: section of a disk reserved for pages. • Usually a separate partition (“swap -l” on xi or “cat /proc/meminfo” on Linux) • Sometimes part of the filesystem (Windows NT:Settings/Control Panel/System/Performance/Virtual Memory) • Rule of thumb: swap area = 2 * physical memory

Virtual Memory System Examples • Swap area: section of a disk reserved for pages. • In emergencies, most UNIXes can swap to a file in the filesystem using the “mkfile” and “swapon” commands. • Various optimizations can be done at process creation to minimize initial swap disk traffic (Figure 12.24). • Copy-on-write: useful trick for implementing UNIX fork(). A child initially shares all the pages with it’s parent in a read-only fashion and the parent’s page table entries are all set to read-only When child or parent attempts to change a page a new copy of the page is created in a new page frame and the page table adjusted accordingly (Figure 12.25). This is more efficient than simply duplicating the parent’ memory.

Virtual Memory System Examples • More tricks: two-handed clock algorithm, allowing pages to be reclaimed sooner rather than waiting for a full hand revolution (Figure 12.26). • Standby page lists: keep a list of free page frames that the page replacement algorithm can dip into if there are no clean (don’t have to be written back to swap device before re-use) free pages (Figure 12.27). • Clustering pages: at swap-in, load in a clump of pages to prevent excessive page faults while the process is re-establishing it’s resident set. • File mapping: the virtual memory system acts as a disk cache.

Virtual Memory System Examples • Portable VM systems: as with the rest of the OS, it’s good to have as much of the VM system as possible be written in a machine-independent fashion so porting to a new architecture is quick. • Sparse address space: a VM system makes it easier to support the regioning of the VM address space of a process, something typically done by a compiler to separate code, data, heap and stack space. • VM Systems used in Actual Oses • OS/2 Version 2.0 • 4KB page size; 32-bit address space • 2-level page tables with paged page tables

VM Systems used in Actual OSes • OS/2 Version 2.0 • Single-handed clock alg w/ a standby list. • Copy-on-write • Windows NT • 4KB page size; 32-bit address space. • Swap space allocated only when needed. • Local FIFO page replacement alg. w/ a standby list. • Demand paging with clustering on initial swap in. • VM system built on top of HAL (Hardware Abstraction Layer) • Mach & OSF/1 • Goals: flexible sharing; large, sparse address space; memory-mapped files; user control of page replacement.

VM Systems used in Actual OSes • Mach & OSF/1 • Object-oriented approach, even in memory system. • Global FIFO replacement with a standby list. • Copy-on-write. • Clustered paging. • UNIX System V Release 4 • 2-handed clock page replacment • Address space divided into segments (but it is not a segmented VM system; it is paged). • Mostly hardware independent implementation. • Clustered paging.

VM Systems used in Actual OSes • UNIX 4.3 BSD • 2-handed clock replacment • UNIX 4.4 BSD: based on Mach VM architecture. • DEC VMS: local FIFO with standby list. • Macintosh: second-chance clock replacement. • IBM MVS: 4 KB page size; 3-level paging. • Useful UNIX commands to view paging behavior. • top • vmstat • monitor (AIX only)

Chapter 12 - Virtual Memory Systems