Module 10: Virtual Memory

Module 10: Virtual Memory • Background • Demand Paging • Performance of Demand Paging • Page Replacement • Page-Replacement Algorithms • Allocation of Frames • Thrashing • Other Considerations • Demand SegmentationAnnotations by instructor are in blueAdapted for 6th ed,Last Updated 11/12/03++ Applied Operating System Concepts

Background • Virtual memory – separation of user logical memory from physical memory. • Only part of the program needs to be in memory for execution. • Logical address space can therefore be much larger than physical address space. • Paging always used: Not all pages need be in memory, pages to be “swapped” in and out on demand . • Virtual memory can be implemented via: • Demand paging • Demand segmentation Applied Operating System Concepts

Virtual Memory That is Larger Than Physical Memory Fig. 10.1 Applied Operating System Concepts

Demand Paging - p. 303... • No longer need to entire process in memory – only that portion which is needed as opposed to swapping in chapter 9 – in which all pages of a process were required to be resident in memory when a process was loaded in memory. • Now use a “lazy swapper”: only swaps a page into memory if it is needed: demand paging: • Works because of the locality of reference - a fundamental principle to be described later. • Bring a page into memory only when it is needed. • Less I/O needed • Less memory needed • Faster response • More users • Page is needed  reference to it • invalid reference  abort • not-in-memory  bring to memory ==>page faultSee fig 10.1, 10.2, 10.3, 10.4 (VM Scheme)Ideal model is to “pause” instruction on Page Fault.Reality is that instruction is re-started - problematic for overlapped source- destination - see page 324 Applied Operating System Concepts

Transfer of a Paged Memory to Contiguous Disk Space Paged swapping scheme as in chapter 9 - entire process in memory suppose now we do not require all pages to be “resident” . Fig. 10,2 Applied Operating System Concepts

Valid-Invalid Bit • With each page table entry a valid–invalid bit is associated(1  in-memory (and a “legal” page), 0 not-in-memory - legal but on disk or illegal)… distinguish from invalid (illegal) reference - see fig. 10.3, 10.4. • Initially valid–invalid but is set to 0 on all entries. • Example of a page table snapshot. • During address translation, if valid–invalid bit in page table entry is 0  page fault (or possibly illegal regerence). Frame # valid-invalid bit 1 1 1 1 0 ... 0 0 page table Applied Operating System Concepts

Page Table When Some Pages Are Not in Main Memory and some ouside process space Meaning of valid bit: valid means page is both in memory and legal (in address space of process) invalid means that page is either outside of the address space of the process (illegal), OR a legal address but not currently resident in memory (a page fault - most common) NOTE: ref to pages 6 & 7 are illegal. refs to pages 3 & 4 are page faults both cases marked invalid. Fig. 10.3 Applied Operating System Concepts

Page Fault • If there is ever a reference to a page, first reference will trap to OS  page fault • OS looks at another table to decide: • illegal reference  abort.or • Just not in memory ==> a page fault. • Get empty frame. • For page fault: “Swap” page into frame. … “swap” is bad terminology in this context - use “page” as a verb. • Reset tables, validation bit = 1. • Problem: Restart instruction: Overlapped block move - a problem(see p. 324)IBM Sys/360/370 MVC: restore memory before trapor make sure all pages are in memory before starting Applied Operating System Concepts

Steps in Handling a Page Fault No page replacement needed in this case - a free frame was found. Fig. 10.4 Applied Operating System Concepts

But what happens if there are no free frames? • Page replacement – find some page in memory, but not really in use, “swap” it out if modified, or overlay it if not modified. • Get rid of “dead wood” • come up with an replacement algorithm • Algorithm should maximize performance: want an algorithm which will result in minimum number of page faults, and does not add excessive overhead. • Two killers:Same page may be brought into memory several times ... if it happens too frequently, then “thrashing” - see later.Too much overhead. Applied Operating System Concepts

Performance of Demand Paging- p. 305 • See sect 10.2.2, page 325 on Page Fault scenario- handled similar to I/O request • Page Fault Rate 0  p  1.0 … 1-p is the “hit ratio” • if p = 0 no page faults • if p = 1, every reference is a fault • In practice p should be very small: p < .001 • 1-p is called the hit ratio • Effective Access Time (EAT) EAT = (1 – p) x memory access + p {page fault overhead + [swap page out ] + swap page in + restart overhead} Applied Operating System Concepts

Demand Paging Example from p. 327 • Memory access time = 100 nanoseconds • Page fault service time = 25milliseconds • Let the fault rate be p • Effective access time:EAT = (1-p)(100ns) +p(25ms)a 10% degradation would be:EAT = 110 > 100(1-p) + 25,000,000p nanosecondsSolving for p gives: p < 4x10-7or page faults must occur no more than 4 per 10 millionreferences to achieve a degradation of no more than 10%!  Fault rate must be kept extremely low … if you know nothing else, would you say it is possible for real programs? Applied Operating System Concepts

Process Creation • Virtual memory allows other benefits during process creation: - Copy-on-Write - Memory-Mapped Files Applied Operating System Concepts

Copy-on-Write • If forked child process immediately calls an “execp()” to overlay itself with another process, then physically “cloning” (copying) the parent is unnecessary. • If the child or parent never modifies itself, then no need to keep separate copies of process data for for both parent and child – share a single copy page sharing in virtual memory allows this: • Copy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory.If either process modifies a shared page, only then is the page copied. • COW allows more efficient process creation as only modified pages are copied. • Free pages are allocated from a pool of zeroed-out pages. Applied Operating System Concepts

Memory-Mapped Files • Memory-mapped file I/O allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory. • Part of the virtual address space is logically associated with a file and can be demand paged into memory • an extension of the idea of paging in a portion of a process - now we apply it to files. • A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses. • Simplifies file access by treating file I/O through memory rather than read()write() system calls - useful for device drivers. • Also allows several processes to map the same file allowing the pages in memory to be shared. Applied Operating System Concepts

Memory Mapped Files Applied Operating System Concepts

Page Replacement - sect 10.3, p. 308 • Prevent over-allocation of memory (no free frames) by modifying page-fault service routine to include page replacement.You don’t prevent overallocation - this is the name of the game - you just learn to live with it! ==> use page replacement. • Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk. • Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory. • “Ideally”, the user is unaware of paging: VM paging is transparent to the user … at most a small performance hit. Applied Operating System Concepts

Need For Page Replacement Options when memory full: kill the process - not good swap the process out - maybe do page replacement - OK page hit ==> Memory full – Free frame List is empty … must free up Space: page replacement page fault ==> B needed, but not in memory Fig 10.6 Applied Operating System Concepts

Basic Page Replacement • Find the location of the desired page on disk. • Find a free frame: - If there is a free frame, use it. - If there is no free frame, use a page replacement algorithm to select a victim frame … this is the likelihood. • Read the desired page into the (newly) free frame. Update the page and frame tables.If the “victim” page frame is modified it will have to paged out to the disk. • Restart the process (process was blocked during page fault processing). Applied Operating System Concepts

Page Replacement “Victim” page invalidated frame # is now garbage (0) Originally was in frame f ==> New page brought in ==> put in frame f Applied Operating System Concepts

Page-Replacement Algorithms • Want lowest page-fault rate. • Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. • In all our examples, the reference string is 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5. These are “crappy” examples SEE better examples in the text section 10.4 of Silberschatz 6th ed. These have been added to these set of slides (marked Silb 6th ed) Applied Operating System Concepts

Graph of Page Faults Versus The Number of Frames(ideally- see “non-ideal” case for FIFO later) Applied Operating System Concepts

First-In-First-Out (FIFO) Algorithm • Diagram below not clear – see slide 10. 24 for more clear example • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • 3 frames (3 pages can be in memory at a time per process) • 4 frames • FIFO Replacement – Belady’s Anomaly • more frames  less page faults - only due to initial loading - poor example 1 1 4 5 Student exercise: Use the approach in slide 10.24 To determine the 9 & 10faults In these two examples 2 2 1 3 9 page faults 3 3 2 4 1 1 5 4 2 2 1 10 page faults 5 3 3 2 4 4 3 Applied Operating System Concepts

FIFO Page Replacement From Silb 6th ed - fig 10.9, p. 336 Applied Operating System Concepts

FIFO Illustrating Belady’s Anamoly Applied Operating System Concepts

Optimal Algorithm (OPT) • Diagram below not clear – see slide 10. 27 for more clear example • Replace page that will not be used for longest period of time. • 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 • Not implementable because it requires knowledge of the future • Used for measuring how well your algorithm performs. 4 1 Student exercise: Use the approach in slide 10.27 To determine the 6 faults In this example 2 6 page faults 3 4 5 Applied Operating System Concepts

Optimal Page Replacement From Silb 6th ed - fig 10.11, p. 338 Applied Operating System Concepts

Least Recently Used (LRU) Algorithm Diagram below not clear – see slide 10. 29 for more clear example • Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 Student exercise: Use the approach in slide 10.29 To determine the 8 faults 5 2 3 5 4 4 3 If we use recent past as an approximation of near future, then we replace the page that has not been used for the longest period of time.==> the LRU algorithm … Same as the OPT algorithm looking back in time instead of the future. A good approximation of OPT – maybe the best outside of OPT Applied Operating System Concepts

LRU Page Replacement From Silb 6th ed - fig 10.12, p. 339 Applied Operating System Concepts

LRU Algorithm (Cont.) • Counter implementation - time stamp in the page table entry • There is a hardware clock incremented by the CPU each time a memory reference is made (count of total number references) • Every page table entry has a counter; every time page is referenced through this entry, copy the clock into the counter. (periodic snapshot of clock). Updating counter has some overhead, but is rolled into accessing the PT for each reference. • Replace the page with the smallest (oldest) time value. • Logically requires a search to find LRU page - but only done during page fault time - may be small compared to disk access • A write to memory (PT) is required each time a page represented in the page table is reference to update the counter. • Stack implementation – keep a stack of page numbers in a double link form - most intuitive - but lots of overhead pointer “flipping” in memory: • Page referenced: • move it to the top – drop the entries above down by one • Entry at bottom is least recently used • Entry at top is most recently used • requires 6 pointers to be changed – in memory! • No search for replacement Applied Operating System Concepts

Use Of A Stack to Record The Most Recent Page References Move address 7 to top drop 2,1,0 down Applied Operating System Concepts

LRU Approximation Algorithms • Reference bit – in page table • With each page associate a bit, initially = 0 • When page is referenced bit set to 1. • Replace the one which is 0 (if one exists). We do not know the order, however. • Need scheme for resetting ref bit or they may all go to 1 • Use history bits, shift ref bit to hi position (from outside when referenced) & periodically right shift the history bits (padding with 0’s) - replace page with smallest history byte integer – see page 341 … zeros periodically shifted into high position except if referenced, then a one shifted in. If not referenced, value decreases, if referenced value increases. • Second chance orClock replacement algorithm . • Need reference bit • If a hit: updated even on a hit (set ref bit to 1). • If a PF: Scan PT searching for first 0 reference bitIf page scanned (in clock order) has reference bit = 1. then: • set reference bit 0. • leave page in memory. • replace first page (in clock order) which has 0 reference bit Applied Operating System Concepts

Second-Chance (clock) Page-Replacement Algorithm Applied Operating System Concepts

Counting Algorithms • Keep a counter of the number of references that have been made to each page. • LFU Algorithm: replaces page with smallest count – rationale: an active page should have a large count – maybe this is only an “initial thing” (transient) and it gets “old” but never gets replaced due the the high count – LRU would get rid of the page in this case. • MFU Algorithm: replaces page with largest count - based on the argument that the page with the smallest count was probably just brought in and has yet to be used – problem: if it never gets referenced again it hangs around due to the low count. • These algorithms not commonly used: costly implementation and effectiveness is too specific to application - doesn’t approximate OPT Applied Operating System Concepts

Allocation of Frames - new topic • Each process needs minimum number of pages. • Single user model: process is allowed maximum allocation which generally is smaller that the process size • the process then demand pages what it needs • if allocation too small, efficiency/performance low • inverse trade-off between degree of MP and size of allocation ==> implication on thrashing (see later)… multi-user • Architecture requirements: - p. 345Example: IBM 370 – 6 pages to handle MVC instruction: • instruction is 6 bytes, might span 2 pages. • 2 pages to handle from, 2 pages to handle to. • Inst. restart problematic on page fault-must retain all related pages • Minimum of frames determined by architecture, max det. by mem. size • Three major allocation schemes - sec 10.5.2 for 1st two • Non-dynamic allocation– constant throughout life of processes • fixed (equal) allocation (uniform frame assignment to all processes) • proportional allocation:allocation proportional to size of process -special case of priority allocation – allocation based on priority, which could be size • dynamic allocation – the amount of allocation may dynamically vary during life of process. This results from “global” replacement – see below. Applied Operating System Concepts

Fixed (Equal)and proportional Allocation examples • Note that this example is still for non-dynamic allocation – remains constant throughout the life of the process • Equal allocation – e.g., if 100 frames and 5 processes, give each 20 pages. • Proportional allocation – Allocate according to the size of process. available Example: Applied Operating System Concepts

Global vs. Local Replacement • Local replacement – each process selects from only its own set of allocated frames • Frame allocation never changes for a process – non-dynamic allocation – could be fixed or proportional. • Paging behavior independent of other processes, but a process can horde frames needed by other processes. • Global replacement – process selects a replacement frame from the set of all frames for all processes; one process can take frames from another. • This results in a form of dynamic allocation • Allocation is variable - may increase at the expense of other processes, or decrease because a higher priority process steals some frames • Process cannot control its own fault rate - depends on paging behavior of other processes • Question: are frames stolen only from the other processes free frame list, or can a used (loaded with a virtual page) frame be confiscated? • Can combine global/local in a priority scheme: first choose from own pool, if this gets depleted, then steal frames from lower priority process. Applied Operating System Concepts

Summary (From Stallings):Allocation or “Resident Size” • Fixed-allocation (non-dynamic allocation) • Gives a process a fixed number of pages within which to execute • Can be a fixed size for all processes or proportional, but does not change though life of processes • when a page fault occurs, one of the pages of that process must be replaced • Variable-allocation (dynamic allocation) • number of pages allocated to a process varies over the lifetime of the process Applied Operating System Concepts

Allocation and Replacement PolicySummary • Variable Allocation (dynamic in time) and Global replacement • Easiest to implement • Adopted by many operating systems • Operating system keeps list of free frames (global list) • Free frame is added to resident set of process when a page fault occurs • If no free frame, replaces one from another process- a “working” frame - reducing donar working set by 1 pageand increasing recipient by 1 page • Fixed Allocation (in time) and Local Replacement • When new process added, allocate number of page frames based on application type, program request, or other criteria • Number of page frames allocated to a process is fixed throughout life of processes • Page to be replaced is chosen from among the frames allocated to the process having the fault • Reevaluate allocation from time to time • Fixed Allocation (in time) and Global Replacement • Not possible (Why?) – resident set fixed – cannot grow. Applied Operating System Concepts

Thrashing • In all previous schemes, will a process get enough pages to run efficiently (low fault rate)? • If a process does not have “enough” pages, the page-fault rate is very high. This leads to: • low CPU utilization. • operating system thinks that it needs to increase the degree of multiprogramming - to improve degree of MP.Thus another process added to the system!!! - a vicious cycle leading to: • Thrashing: a process is busy swapping pages in and out most of the time - very little time spent on productive work - most time spent doing paging. Applied Operating System Concepts

Thrashing Diagram • Why does paging work?Locality model • Process migrates from one locality to another. • Localities may overlap. • Why does thrashing occur? size of locality > total allocated memory size • High degree of MP results in allocation for a process to get too small and perhaps lose its “working set” – see below Applied Operating System Concepts

Locality In A Memory-Reference Pattern • Principle of locality – empirical verification: • spatial: references tend to clusterin contiguous address ranges. • Temporal: if a reference is made, It will be likely made again in near • future Applied Operating System Concepts

Working-Set Model • A scheme to avoid thrashing - give all processes what they need - if not possible limit the number of processes - an allocation scheme. •   working-set window  a fixed number of page references Example: the last 10,000 instructions or references • WSSi (working set of Process Pi) =total number of pages referenced in the most recent  (varies in time) • if  too small will not encompass entire locality. • if  too large will encompass several localities. • if  =   will encompass entire program. • D =  WSSi  total demand frames = total number of allocated frames to keep all processes “happy” • if D > m  Thrashing (m = total number of available frames) • Policy if D > m, then suspend one of the processes - prevents thrashing by balancing degree of MP with acceptable fault rate. - (move its pages to the disk) … note “swapping” still has a role in virtual memory systems. Applied Operating System Concepts

Working-set model  = 10 references Applied Operating System Concepts

Keeping Track of the Working Set • Approximate with interval timer + a reference bit • Example:  = 10,000 • Timer interrupts (not a page fault) after every 5000 time units. • Maintain a reference bit for each page in memory in the page table. • Also keep in memory 2 bits for each page (history bits), copy ref bit to 1st position at 1st interrupt) ref, and to 2nd position on 2nd interrupt (5000th ref), in circular queue fashion – resetting all ref bits to 0 after each copy • If one of the history bits is 1, then this page is in working set, ie., was referenced in WS window - keep this page. • Why is this not completely accurate? - how recent the reference is uncertain - choosing a “victim” too random if all referenced.. • Improvement = 10 bits and interrupt every 1000 time units.- can now have more refined decision criteria to determine working set. Applied Operating System Concepts

Page-Fault Frequency Scheme • Establish “acceptable” page-fault rate. • If actual rate too low, process loses frame. • If actual rate too high, process gains frame. • A direct “feedback” approach to control fault rate – actually a way of maintaining a working set • If the fault rate increases and no free frames are available, we may have to suspend a process (swap its pages back to the disk) … note “swapping” still has a role in virtual memory. Applied Operating System Concepts

Other Considerations • Prepaging • some aspect of swapping - bring in all pages or a lot of swapped out pages rather than pure demand paging - more efficient, less arm movement on disk • Example: if a process had been “swapped out” due I/O wait or the lack of free frames, then “remember” its working set and bring in the entire working set when it is again swapped in • Page size selection- a “tuning” situation – sect.10.8.2 • Internal fragmentation – favors small • table size – favors large • I/O overhead – favors large – less seeks – pages contiguous on disk • Locality – favors small – little “dead wood” in a page – bring in just what you need • Provide Multiple Page Sizes. This allows applications that require larger page sizes the opportunity to use them without an increase in fragmentation Applied Operating System Concepts

Other Considerations (Cont.) • TLB Reach - The amount of memory accessible from the TLB. • TLB Reach = (TLB Size) X (Page Size) • Ideally, the working set of each process is stored in the TLB (completely referenced from the TLB). Otherwise there is a high degree address resolution in the page tablewhich will degrade the effective access time. • To put it simply: beef up the TLB and increase performance, but it may be costly – done in hardware. • Can increase TLB reach also by increasing the page size – will also cut down on PT size –but this also has a dark side: you will drag along more dead wood (not part of working set) in a page. Applied Operating System Concepts

Other Considerations (Cont.) • Inverted Page table (See chapter 9 on how it works) • Inverted PT does not have information about the location of a missing page (location on the disk) which is needed in the event of a page fault.A regular PT would have this information – how come there is a problem with the inverted PT? … good test question! • For a non virtual memory system, where entire processes is always swapped this page info is not needed. • One solution is to have an external Page table on the disk which gets accessed only during a page fault – it would contain the logical address information … complicates the picture, because when it is paged in, it could trigger more process page faults! Applied Operating System Concepts

Other Considerations (Cont.) • Although VM is supposed to be transparent to the user, taking care in defining data structures or programming structures can improve locality – this ideally can be made a compiler task. • Smart compilers to efficiently organize object data/code being aware of paging. • Program structure • int A[][] = new int[1024][1024]; • Each row is stored in one page • Program 1 for ( j = 0; j < A.length; j++) for (i = 0; i < A.length; i++) A[i,j] = 0;1024 x 1024 page faults – indexing is row to row • Program 2 for (i = 0; i < A.length; i++) for ( j = 0; j < A.length; j++) A[i,j] = 0; 1024 page faults - indexing is column to column Applied Operating System Concepts

Module 10: Virtual Memory