CS444/CS544 Operating Systems

CS444/CS544Operating Systems Virtual Memory 4/04/2007 Prof. Searleman jets@clarkson.edu

Outline • Virtual Memory • Demand Paging (Segmentation) NOTE: • Friday – class at 2:00 in SC354 (no lab) • Read: Chapters 9 & 10 • HW#10, due Wednesday, 4/11 • Exam#3: Tuesday, 4/17, 7:00 pm, SC160 • topics focus on Memory Management & File Systems • SGG: Chapters 7, 8, 9, 10, 11, 12

Recap: Paging • Each entry in a page table maps virtual page numbers (VPNs) to physical page frame numbers (PFNs) • Virtual addresses have 2 parts: VPN and offset • Physical addresses have 2 parts: PFN and offset • Offset stays the same is virtual and physical pages are the same size • VPN is index into page table; page table entry tells PFN • Are VPN and PFN the same size?

Virtual Address Virtual page # offset Physical Address Page frame # offset Page frame # Translation Page frame 0 Page Frame 1 Page frame N

Example • Assume a 32 bit address space and 4K page size • 32 bit address space => virtual addresses have 32 bits and full address space is 4 GB • 4K page means offset is 12 bits (212 = 4K) • 32-12 = 20 so VPN is 20 bits • How many bits in PFN? Often 20 bits as well but wouldn’t have to be (enough just to cover physical memory) • Suppose virtual address 00000000000000011000000000000111 or Ox18007 • Offset is Ox7, VPN is 0x18 • Suppose page table says VPN 0x18 translates to PFN 0x148 or 101001000 • So physical address is 00000000000101001000000000000111 or 0x148007

M R V Page frame number prot Page Table Entries Revisited • Entry can and does contain more than just a page frame number • (M)odify bit – whether or not the page is dirty • (R )eference bit – whether of not the page page has been read/written • (V)alid bit – whether or not the page table entry contains valid translation • (prot)ection bits say which operations are valid on this page (Read/Write/Execute) • Page frame number

Processes’ View of Paging • Processes view memory as a contiguous address space from bytes 0 through N • OS may reserve some of this address space for its own use (map OS into all processes address space is a certain range or declare some addresses invalid) • In reality, virtual pages are scattered across physical memory frames (and possibly paged out to disk) • Mapping is invisible to the program and beyond its control • Programs cannot reference memory outside its virtual address space because virtual address X will map to different physical addresses for different processes!

Advantages of Paging • Avoid external fragmentation • Any physical page can be used for any virtual page • OS maintains list of free physical frames • Minimize internal fragmentation (pages are much smaller than partitions) • Easy to send pages to disk • Don’t need to send a huge region at once • Use valid bit to detect reference to paged out regions • Can have non-contiguous regions of the address space resident in memory

Disadvantage of Paging • Memory to hold page tables can be large • One PTE per virtual page • 32 bit address space with 4KB pages and 4 bytes/PTE = 4 MB per page table per process!!! • 25 processes = 100 MB of page tables!!!!! • Can we reduce this size? • Memory reference overhead • Memory reference means 1 memory access for the page table entry, doing the translation then 1 memory access for the actual memory reference • Caching translations? • Still some internal fragmentation • Process may not be using memory in exact multiples of page size • Pages big enough to amortize disk latency

Reduce the Size of the Page Tables? • Play same trick we did with address space – why have a PTE for virtual pages never touched? • Add a level of indirection  • Two level page tables

Virtual Address Virtual page # offset offset Secondary page # Master page # Two level Page Table • Add a level of indirection • Virtual addresses now have 3 parts: master page #, secondary page # and offset • Make Master page table fit in one page • 4K page = 1024 4 byte PTEs • So 1024 secondary page tables = 10 bits for master, still 12 for offset so 10 left for secondary • Invalid MPTE means whole chunk of address space not there

Physical Address Page frame # offset Secondary page table Page frame # offset Secondary page # Master page # Translation Virtual Address

Page the page tables • In addition to allowing MPTE’s to say invalid could also say this secondary page table is on disk • Master PTE for each process must stay in memory • Or maybe add another level of indirection? • Table mapping Master PTEs for each process to DRAM location of disk LBA

Too much of a good thing? • Each level of indirection adds to access cost • Original page table scheme doubled the cost of memory access (one for page table entry and one for real memory location) • Two level page tables triple the cost • Solve problem with our other favorite CS technique….caching!

TLB • Add a hardware cache inside the CPU to store recent virtual page to page table entries • Fast! One machine cycle for a hit • OS doesn’t even have to get involved when hit in the TLB • TLB = translation lookaside buffer

TLB • Usually a fully associative cache • Cache tags are virtual page numbers • FAST! All entries are searched/compared in parallel • SMALL! Usually only 16-48 entries (64-192KB) • In hardware, SMALL often equals FAST • Cache values are PTEs • TLB is managed by the memory management unit or MMU • With PTE + offset, MMU can directly calculate the physical address

How effective are TLBs? • Small: Only 16-48 entries • Maps only 64-192 KB of address space • If process active using more address space than that will get TLB misses • Amazingly can achieve >99% of address translations as hits!! • What does that mean? • Processes have very high degree of locality to their accesses patterns • When map a 4K page likely access one memory location, that prefetches the rest and likely to access them next (if so 1 in 1024 4 byte accesses will be hits)

TLB miss • On a TLB miss, what happens? • Hardware loaded TLBs • Hardware knows where page tables are in memory (stored in register say) • Tables must be in HW defined format so that it can parse them • X86 works this way • Software loaded TLB • On TLB miss generate an OS fault and OS must find and load the correct PTE and then restart the access • OS can define PTE format as long as loads in format HW wants into the TLB • Either way have to choose one of current entries to kick out – which one? • TLB replacement policy usually simple approximation of LRU

Other OS responsibility? • Even if have HW loaded TLB • What else must the OS do? • Hint: context switch

Context Switch • Contents of TLB reflect mapping from virtual to physical - that applies to only one process • On context switch must flush the current TLB entries of things from last process • Could restore entries for new process (preload) or just set them all to invalid and generate faults for first few accesses • This is a big reason context switches are expensive!! • Recall: kernel level thread switch more expensive the user level switch ..now you know even more why!

Segmentation • Similar technique to paging except partition address space into variable sized segments rather than into fixed sized pages • Recall FS blocks vs extents? • Variable size = more external fragmentation • Segments usually correspond to logical units • Code segment, Heap segment, Stack segment etc • Virtual Address = <segment #, offset> • HW Support? Often multiple base/limit register pairs, one per segment • Stored in segment table • Segment # used as index into the table

Paging Segments? • Use segments to manage logical units and then divide each segment into fixed sized pages • No external fragmentation • Segments are pageable so don’t need whole segment in memory at a time • x86 architecture does this

Linux on x86 • 1 kernel code segment and 1 kernel data segment • 1 user code segment and 1 user data segment • Belongs to process currently running • N task segments (stores registers on context switch) • All segments paged with three level page tables

Virtual Memory Problem #1 • every virtual address must be mapped to a physical address, which means that memory must be accessed twice (once to read page table and once to read data) solution – add a cache of PTEs for recently referenced pages; this is called a TLB observation – this works because programs exhibit both temporal and spatial locality

Example: 100 nanosecond memory cycle time No paging: access time for 1 word of data = 100 ns Page Table in memory (no TLB): effective access time (EAT) = 100 ns (to read PTE) + 100 ns (to read data) = 200 ns Page Table in memory + TLB; 20 ns to search TLB PTE in TLB => 20 ns + 100 ns = 120 ns to read data PTE not in TLB => 20 ns + 2*100ns = 220 ns 80% hit rate: EAT = 0.8 *(120) + 0.2*(220) = 140 ns !! 90% hit rate: EAT = 0.9*(120) + 0.1*(220) = 130 ns

Virtual Memory Problem #2 • Page Tables are too big 32 bit address, 4KB page => ~ 1 million PTEs => 4MB for page table for one process solutions: • multilevel page tables – page the page table • inverted page tables – have one entry per frame of memory with info about what page of what process is in that frame

Virtual Memory Problem #3 • There’s not enough physical memory (multiprogramming works best if there’s enough processes on the ready list & the job mix is good) solution – only keep some of the pages that a process needs in memory at any one time; to reduce the # of page faults, identify the current “working set” for a process observation – locality to the rescue again

How much memory do processes need? (Theoretical Answer) • “Working set” of a process is the set of virtual memory pages being actively used by the process. • Define a working set over an interval • WSP(w)= {pages P accessed in the last w accesses} • If w = total number of accesses P makes then WSP(w)= every virtual memory page touched by P • Small working set = accesses of a process have high degree of locality

Changes in Working Set • Working set changes over the life of the process • Ex. At first all the initialization code is in the working set of a process but after some time if won’t be any longer • Intuitively, you need to keep the working set of a process in memory or the OS will constantly be bring pages on and off of disk • Normally when we ask how much memory a given program needs to run, the answer is either its average or maximum working set (depending on how conservative you want to make your estimate)

Demand Paging • When a process first starts up • It has brand new page table with all PTE valid bits set to false because no pages yet mapped to physical memory • As process fetches instructions and accesses data, there will be “page faults” for each page touched • Only pages that are needed or “demanded” by the process will be brought in from disk • Eventually may bring so many pages in that must choose some for eviction • Once evicted, if access, will once again demand page in from disk

Demand Paging • When working set changes (like at the beginning of a process), you will get disk I/O – really no way around that! • BUT if most memory accesses result in disk I/O the process will run *painfully* slow • Virtual memory may be invisible from a functional standpoint but certainly not from a performance one • There is a performance cliff and if you step off of it you are going to know • Remember building systems with cliffs is not good

Given a virtual address U = p || w search TLB for key p if found, f = frame number in PTE in TLB else do a memory fetch to get PTE in memory if page is in memory (valid bit == 1) update TLB f = frame number in PTE else service page fault Generate physical address M = f || w

Servicing a page fault Trap to OS save state of running process find desired page on disk if there is a free frame, use it else perform page replacement schedule a disk operation for the page block the process When interrupt occurs (page is now in memory), update the page table for the process waiting for that page & restart the instruction

Example: 100 nanosecond memory cycle time 20 ns to search TLB, 90% hit rate 2% page fault rate time to swap in/out a page = 8 milliseconds = 8000000 ns (assume all page frames are occupied, and the victim page is dirty => must be swapped out before new one swapped in) EAT = 0.9*(120 ns) // 90% of the time, TLB hit + 0.08*(220 ns) // 8%, TLB miss, page in memory + 0.02*(20 ns + 100 ns + 2*(8000000ns) + 100 ns) = 320130 ns! Is a 2% page fault rate realistic?

Prepaging? • Anticipate a page fault before it happens and prefetch the data • Overlap fetch with computation • Can be hard to predict and if the prediction is wrong, you can evict something useful • Programmers can give hints • vm_advise

Thrashing • Thrashing – spending all your time moving pages to and from disk and little time actually making progress • System is overcommitted • People get like this 

Avoiding Paging • Given the cost of paging, we want to make it as infrequent as we can • Function of: • Degree of locality in the application (size of the working set over time) • Amount of physical memory • Page replacement policy • The OS can only control the replacement policy

Goals of Replacement Policy • Performance • Best to evict a page that will never be accessed again if possible • If not possible, evict page that won’t be used for the longest time • How can we best predict this? • Fairness • When OS divides up the available memory among processes, what is a fair way to do that? • Same amount to everyone? Well some processes may not need that amount for their working set while others are paging to disk constantly with that amount of memory • Give each process its working set? • As long as enough memory for each process to have its working set resident then everyone is happy • If not how do we resolve the conflict?

CS444/CS544 Operating Systems

CS444/CS544 Operating Systems

Presentation Transcript

Operating Systems

Operating Systems

Lesson 8 Operating Systems

Types of Operating Systems

Operating Systems {week 06}

Module 5 – Operating Systems

Security for Operating Systems CS 111 On-Line MS Program Operating Systems Peter Reiher

Operating Systems

Popular Operating Systems

Operating Systems {week 01.a}

CS 502 Introduction to Operating Systems

Operating Systems

Chapter 3: Operating Systems

Operating Systems {week 11a }

CS444/CS544 Operating Systems

Operating systems

Operating Systems review

The Operating System

Chapter 18 Three Operating Systems

Introduction to Operating Systems

OPERATING SYSTEMS SYSTEMS

Processes and operating systems